Identifying Generalization Properties in Neural Networks

Huan Wang; Nitish Shirish Keskar; Caiming Xiong; Richard Socher

arXiv:1809.07402·cs.LG·September 21, 2018·39 cites

Identifying Generalization Properties in Neural Networks

Huan Wang, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

PDF

Open Access

TL;DR

This paper explores the relationship between neural network generalization and local properties of solutions, such as the Hessian, proposing a new metric and algorithm to evaluate and improve generalization based on these properties.

Contribution

It establishes a theoretical connection between model generalization and Hessian-related properties within the PAC-Bayes framework, introducing a practical scoring metric and optimization algorithm.

Findings

01

Proves the link between generalization and Hessian properties.

02

Proposes a metric to score model generalization.

03

Develops an algorithm to optimize models based on this metric.

Abstract

While it has not yet been proven, empirical evidence suggests that model generalization is related to local properties of the optima which can be described via the Hessian. We connect model generalization with the local property of a solution under the PAC-Bayes paradigm. In particular, we prove that model generalization ability is related to the Hessian, the higher-order "smoothness" terms characterized by the Lipschitz constant of the Hessian, and the scales of the parameters. Guided by the proof, we propose a metric to score the generalization capability of the model, as well as an algorithm that optimizes the perturbed model accordingly.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning in Materials Science · Domain Adaptation and Few-Shot Learning