A Boundary-Layer Mechanism for One-Third Scaling in Online Softmax Classification
Marcel K\"uhn, Yoon Thelge, Bernd Rosenow

TL;DR
This paper uncovers an asymptotic boundary-layer mechanism explaining the one-third power-law scaling in online softmax classification, highlighting how boundary effects influence learning curves and generalization.
Contribution
It introduces a novel boundary-layer analysis revealing the slow power-law decay in test loss and error, extending understanding of neural scaling laws beyond spectral explanations.
Findings
Late-time solutions exhibit a ^{-rac{1}{3}} power law for test loss and error.
Learning-rate schedules can improve the power law to ^{-rac{1}{2}}.
Simulations and experiments support the boundary-layer dynamics and their impact on learning curves.
Abstract
Hard-label classification is usually trained with smooth surrogate losses, most prominently softmax cross-entropy. We isolate an asymptotic mechanism by which this mismatch between smooth surrogate and discrete labels produces power-law learning curves in an online teacher-student model. After subtracting the mean logit, the thermodynamic-limit dynamics close in centered variables: a growing centered student-teacher alignment and the residual student variance . At late times, examples away from teacher decision boundaries are already classified confidently and contribute exponentially little. Only boundary layers of width remain active, while the noise of fixed-learning-rate online gradient descent maintains a nonzero . As a function of the training time the late-time solution yields a power law not only for the test loss but also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
