Generalization Performance of Empirical Risk Minimization on   Over-parameterized Deep ReLU Nets

Shao-Bo Lin; Yao Wang; Ding-Xuan Zhou

arXiv:2111.14039·cs.LG·March 1, 2023·1 cites

Generalization Performance of Empirical Risk Minimization on Over-parameterized Deep ReLU Nets

Shao-Bo Lin, Yao Wang, Ding-Xuan Zhou

PDF

Open Access

TL;DR

This paper proves that over-parameterized deep ReLU neural networks can achieve near-optimal generalization error at global minima found by ERM, bridging the gap between optimization and generalization.

Contribution

It introduces a novel deepening scheme for deep ReLU nets and rigorously demonstrates the existence of perfect global minima with strong generalization guarantees.

Findings

01

Existence of global minima with near-optimal generalization error.

02

Over-parameterization ensures global minima are reachable by SGD.

03

Theoretical link between optimization and generalization in deep nets.

Abstract

In this paper, we study the generalization performance of global minima for implementing empirical risk minimization (ERM) on over-parameterized deep ReLU nets. Using a novel deepening scheme for deep ReLU nets, we rigorously prove that there exist perfect global minima achieving almost optimal generalization error bounds for numerous types of data under mild conditions. Since over-parameterization is crucial to guarantee that the global minima of ERM on deep ReLU nets can be realized by the widely used stochastic gradient descent (SGD) algorithm, our results indeed fill a gap between optimization and generalization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning