Loading paper
Training-Induced Escape from Token Clustering in a Mean-Field Formulation of Transformers | Tomesphere