Loading paper
Spectral Flattening Is All Muon Needs: How Orthogonalization Controls Learning Rate and Convergence | Tomesphere