Neglected Hessian component explains mysteries in Sharpness regularization
Yann N. Dauphin, Atish Agarwala, Hossein Mobahi

TL;DR
This paper investigates the role of the neglected Hessian component, the Nonlinear Modeling Error matrix, in sharpness regularization, revealing its importance in understanding why certain regularization methods improve generalization in deep learning.
Contribution
The study introduces a new perspective on the Hessian decomposition, emphasizing the significance of the NME in regularization and challenging the assumed equivalence between weight noise and gradient penalties.
Findings
NME explains sensitivity of gradient penalties to activation functions
Regularizing feature exploitation improves performance
Weight noise and gradient penalties are not equivalent in modern networks
Abstract
Recent work has shown that methods like SAM which either explicitly or implicitly penalize second order information can improve generalization in deep learning. Seemingly similar methods like weight noise and gradient penalties often fail to provide such benefits. We show that these differences can be explained by the structure of the Hessian of the loss. First, we show that a common decomposition of the Hessian can be quantitatively interpreted as separating the feature exploitation from feature exploration. The feature exploration, which can be described by the Nonlinear Modeling Error matrix (NME), is commonly neglected in the literature since it vanishes at interpolation. Our work shows that the NME is in fact important as it can explain why gradient penalties are sensitive to the choice of activation function. Using this insight we design interventions to improve performance. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Neural Networks and Applications · Machine Learning and ELM
MethodsSegment Anything Model
