Loading paper
Clustering Head: A Visual Case Study of the Training Dynamics in Transformers | Tomesphere