A Boundary-Layer Mechanism for One-Third Scaling in Online Softmax Classification

Marcel K\"uhn; Yoon Thelge; Bernd Rosenow

arXiv:2605.22341·cs.LG·May 22, 2026

A Boundary-Layer Mechanism for One-Third Scaling in Online Softmax Classification

Marcel K\"uhn, Yoon Thelge, Bernd Rosenow

PDF

TL;DR

This paper uncovers an asymptotic boundary-layer mechanism explaining the one-third power-law scaling in online softmax classification, highlighting how boundary effects influence learning curves and generalization.

Contribution

It introduces a novel boundary-layer analysis revealing the slow power-law decay in test loss and error, extending understanding of neural scaling laws beyond spectral explanations.

Findings

01

Late-time solutions exhibit a ^{-rac{1}{3}} power law for test loss and error.

02

Learning-rate schedules can improve the power law to ^{-rac{1}{2}}.

03

Simulations and experiments support the boundary-layer dynamics and their impact on learning curves.

Abstract

Hard-label classification is usually trained with smooth surrogate losses, most prominently softmax cross-entropy. We isolate an asymptotic mechanism by which this mismatch between smooth surrogate and discrete labels produces power-law learning curves in an online teacher-student model. After subtracting the mean logit, the thermodynamic-limit dynamics close in centered variables: a growing centered student-teacher alignment $D$ and the residual student variance $Δ$ . At late times, examples away from teacher decision boundaries are already classified confidently and contribute exponentially little. Only boundary layers of width $O (D^{- 1})$ remain active, while the noise of fixed-learning-rate online gradient descent maintains a nonzero $Δ$ . As a function of the training time $α$ the late-time solution yields a $α^{- 1/3}$ power law not only for the test loss but also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.