On generalization bounds for deep networks based on loss surface   implicit regularization

Masaaki Imaizumi; Johannes Schmidt-Hieber

arXiv:2201.04545·stat.ML·October 18, 2022

On generalization bounds for deep networks based on loss surface implicit regularization

Masaaki Imaizumi, Johannes Schmidt-Hieber

PDF

Open Access 1 Repo

TL;DR

This paper investigates how the local geometry of the loss surface influences the generalization ability of deep neural networks, providing bounds that depend on spectral norms rather than parameter count.

Contribution

It introduces a new framework linking local loss surface geometry to implicit regularization, leading to tighter generalization bounds based on spectral norms.

Findings

01

SGD tends to stay near low-dimensional subspaces due to local geometry.

02

Generalization bounds depend on spectral norms, not parameter count.

03

Conditions for SGD stagnation imply improved generalization guarantees.

Abstract

The classical statistical learning theory implies that fitting too many parameters leads to overfitting and poor performance. That modern deep neural networks generalize well despite a large number of parameters contradicts this finding and constitutes a major unsolved problem towards explaining the success of deep learning. While previous work focuses on the implicit regularization induced by stochastic gradient descent (SGD), we study here how the local geometry of the energy landscape around local minima affects the statistical properties of SGD with Gaussian gradient noise. We argue that under reasonable assumptions, the local geometry forces SGD to stay close to a low dimensional subspace and that this induces another form of implicit regularization and results in tighter bounds on the generalization error for deep neural networks. To derive generalization error bounds for neural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

insou/pop_minima
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Sparse and Compressive Sensing Techniques

MethodsStochastic Gradient Descent