Loading paper
Optimizer-Induced Low-Dimensional Drift and Transverse Dynamics in Transformer Training | Tomesphere