Loading paper
Training Dynamics of Softmax Self-Attention: Fast Global Convergence via Preconditioning | Tomesphere