Generalization Performance of Empirical Risk Minimization on Over-parameterized Deep ReLU Nets
Shao-Bo Lin, Yao Wang, Ding-Xuan Zhou

TL;DR
This paper proves that over-parameterized deep ReLU neural networks can achieve near-optimal generalization error at global minima found by ERM, bridging the gap between optimization and generalization.
Contribution
It introduces a novel deepening scheme for deep ReLU nets and rigorously demonstrates the existence of perfect global minima with strong generalization guarantees.
Findings
Existence of global minima with near-optimal generalization error.
Over-parameterization ensures global minima are reachable by SGD.
Theoretical link between optimization and generalization in deep nets.
Abstract
In this paper, we study the generalization performance of global minima for implementing empirical risk minimization (ERM) on over-parameterized deep ReLU nets. Using a novel deepening scheme for deep ReLU nets, we rigorously prove that there exist perfect global minima achieving almost optimal generalization error bounds for numerous types of data under mild conditions. Since over-parameterization is crucial to guarantee that the global minima of ERM on deep ReLU nets can be realized by the widely used stochastic gradient descent (SGD) algorithm, our results indeed fill a gap between optimization and generalization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
