Identifying Generalization Properties in Neural Networks
Huan Wang, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

TL;DR
This paper explores the relationship between neural network generalization and local properties of solutions, such as the Hessian, proposing a new metric and algorithm to evaluate and improve generalization based on these properties.
Contribution
It establishes a theoretical connection between model generalization and Hessian-related properties within the PAC-Bayes framework, introducing a practical scoring metric and optimization algorithm.
Findings
Proves the link between generalization and Hessian properties.
Proposes a metric to score model generalization.
Develops an algorithm to optimize models based on this metric.
Abstract
While it has not yet been proven, empirical evidence suggests that model generalization is related to local properties of the optima which can be described via the Hessian. We connect model generalization with the local property of a solution under the PAC-Bayes paradigm. In particular, we prove that model generalization ability is related to the Hessian, the higher-order "smoothness" terms characterized by the Lipschitz constant of the Hessian, and the scales of the parameters. Guided by the proof, we propose a metric to score the generalization capability of the model, as well as an algorithm that optimizes the perturbed model accordingly.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning in Materials Science · Domain Adaptation and Few-Shot Learning
