Neglected Hessian component explains mysteries in Sharpness   regularization

Yann N. Dauphin; Atish Agarwala; Hossein Mobahi

arXiv:2401.10809·cs.LG·January 26, 2024·1 cites

Neglected Hessian component explains mysteries in Sharpness regularization

Yann N. Dauphin, Atish Agarwala, Hossein Mobahi

PDF

Open Access

TL;DR

This paper investigates the role of the neglected Hessian component, the Nonlinear Modeling Error matrix, in sharpness regularization, revealing its importance in understanding why certain regularization methods improve generalization in deep learning.

Contribution

The study introduces a new perspective on the Hessian decomposition, emphasizing the significance of the NME in regularization and challenging the assumed equivalence between weight noise and gradient penalties.

Findings

01

NME explains sensitivity of gradient penalties to activation functions

02

Regularizing feature exploitation improves performance

03

Weight noise and gradient penalties are not equivalent in modern networks

Abstract

Recent work has shown that methods like SAM which either explicitly or implicitly penalize second order information can improve generalization in deep learning. Seemingly similar methods like weight noise and gradient penalties often fail to provide such benefits. We show that these differences can be explained by the structure of the Hessian of the loss. First, we show that a common decomposition of the Hessian can be quantitatively interpreted as separating the feature exploitation from feature exploration. The feature exploration, which can be described by the Nonlinear Modeling Error matrix (NME), is commonly neglected in the literature since it vanishes at interpolation. Our work shows that the NME is in fact important as it can explain why gradient penalties are sensitive to the choice of activation function. Using this insight we design interventions to improve performance. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Neural Networks and Applications · Machine Learning and ELM

MethodsSegment Anything Model