On generalization bounds for deep networks based on loss surface implicit regularization
Masaaki Imaizumi, Johannes Schmidt-Hieber

TL;DR
This paper investigates how the local geometry of the loss surface influences the generalization ability of deep neural networks, providing bounds that depend on spectral norms rather than parameter count.
Contribution
It introduces a new framework linking local loss surface geometry to implicit regularization, leading to tighter generalization bounds based on spectral norms.
Findings
SGD tends to stay near low-dimensional subspaces due to local geometry.
Generalization bounds depend on spectral norms, not parameter count.
Conditions for SGD stagnation imply improved generalization guarantees.
Abstract
The classical statistical learning theory implies that fitting too many parameters leads to overfitting and poor performance. That modern deep neural networks generalize well despite a large number of parameters contradicts this finding and constitutes a major unsolved problem towards explaining the success of deep learning. While previous work focuses on the implicit regularization induced by stochastic gradient descent (SGD), we study here how the local geometry of the energy landscape around local minima affects the statistical properties of SGD with Gaussian gradient noise. We argue that under reasonable assumptions, the local geometry forces SGD to stay close to a low dimensional subspace and that this induces another form of implicit regularization and results in tighter bounds on the generalization error for deep neural networks. To derive generalization error bounds for neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Sparse and Compressive Sensing Techniques
MethodsStochastic Gradient Descent
