Fast Gauss-Newton for Multiclass Cross-Entropy
Mikalai Korbit, Mario Zanon

TL;DR
The paper introduces Fast Gauss-Newton (FGN), an efficient approximation for the multiclass softmax cross-entropy curvature that simplifies computations while maintaining accuracy in many scenarios.
Contribution
It presents an exact decomposition of the multiclass GGN and proposes FGN as a positive semidefinite under-approximation that is computationally scalable.
Findings
FGN closely approximates the full GGN when competitor mass is concentrated.
FGN deviates as within-competitor covariance increases.
The method enables matrix-free conjugate gradient updates for scalable optimization.
Abstract
In multiclass softmax cross-entropy, the full generalized Gauss-Newton (GGN) curvature couples all output logits through the softmax covariance, making curvature-vector products harder to scale as the number of classes grows. We show that the standard multiclass GGN can be decomposed exactly into a true-vs-rest term and a positive semidefinite within-competitor covariance term. Fast Gauss-Newton (FGN) retains the first term and drops the second, yielding a positive semidefinite under-approximation of the multiclass GGN that is exact for binary classification. The derivation uses an exact true-vs-rest scalar-margin representation of softmax cross-entropy: the loss and gradient are unchanged, and the approximation enters only at the curvature level. Exploiting the FGN curvature structure, the damped update can be written as an equivalent whitened row-space system with one row per…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
