Wolkowicz-Styan Upper Bound on the Hessian Eigenspectrum for Cross-Entropy Loss in Nonlinear Smooth Neural Networks
Yuto Omae, Kazuki Sakai, Yohei Kakimoto, Makoto Sasaki, Yusuke Sakai, Hirotaka Takahashi

TL;DR
This paper derives a closed-form upper bound for the Hessian eigenspectrum in smooth nonlinear neural networks, linking loss sharpness to network parameters and data orthogonality, advancing theoretical understanding.
Contribution
It provides the first analytical upper bound for the Hessian's maximum eigenvalue in nonlinear multilayer networks, bypassing numerical eigenspectrum computation.
Findings
Derived a closed-form upper bound for the Hessian's maximum eigenvalue.
Expressed the bound as a function of network parameters and data orthogonality.
Offers insights into loss sharpness and generalization in deep learning.
Abstract
Neural networks (NNs) are central to modern machine learning and achieve state-of-the-art results in many applications. However, the relationship between loss geometry and generalization is still not well understood. The local geometry of the loss function near a critical point is well-approximated by its quadratic form, obtained through a second-order Taylor expansion. The coefficients of the quadratic term correspond to the Hessian matrix, whose eigenspectrum allows us to evaluate the sharpness of the loss at the critical point. Extensive research suggests flat critical points generalize better, while sharp ones lead to higher generalization error. However, sharpness requires the Hessian eigenspectrum, but general matrix characteristic equations have no closed-form solution. Therefore, most existing studies on evaluating loss sharpness rely on numerical approximation methods. Existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
