Towards Understanding Generalization of Deep Learning: Perspective of   Loss Landscapes

Lei Wu; Zhanxing Zhu; Weinan E

arXiv:1706.10239·cs.LG·November 29, 2017·125 cites

Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes

Lei Wu, Zhanxing Zhu, Weinan E

PDF

Open Access

TL;DR

This paper investigates why deep neural networks generalize well despite overparameterization, revealing that the loss landscape's basin volume influences convergence to good minima, supported by theoretical and empirical evidence.

Contribution

It provides a systematic analysis of loss landscape characteristics that explain deep learning generalization, including theoretical insights for 2-layer networks and extensive numerical validation for deeper models.

Findings

01

Good minima have larger basin volumes in the loss landscape.

02

Low-complexity solutions exhibit small Hessian norms.

03

Empirical evidence supports theoretical analysis for deep networks.

Abstract

It is widely observed that deep learning models with learned parameters generalize well, even with much more model parameters than the number of training samples. We systematically investigate the underlying reasons why deep neural networks often generalize well, and reveal the difference between the minima (with the same training error) that generalize well and those they don't. We show that it is the characteristics the landscape of the loss function that explains the good generalization capability. For the landscape of loss function for deep networks, the volume of basin of attraction of good minima dominates over that of poor minima, which guarantees optimization methods with random initialization to converge to good minima. We theoretically justify our findings through analyzing 2-layer neural networks; and show that the low-complexity solutions have a small norm of Hessian matrix…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Gaussian Processes and Bayesian Inference