Loading paper
The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity | Tomesphere