Generalization Error Bounds of Gradient Descent for Learning   Over-parameterized Deep ReLU Networks

Yuan Cao; Quanquan Gu

arXiv:1902.01384·cs.LG·November 28, 2019·68 cites

Generalization Error Bounds of Gradient Descent for Learning Over-parameterized Deep ReLU Networks

Yuan Cao, Quanquan Gu

PDF

Open Access

TL;DR

This paper derives an algorithm-dependent generalization error bound for over-parameterized deep ReLU networks trained with gradient descent, explaining their good generalization performance beyond traditional uniform convergence bounds.

Contribution

It introduces a new generalization error bound that depends on the training algorithm, specifically gradient descent, for deep ReLU networks.

Findings

01

Gradient descent can achieve arbitrarily small generalization error under certain data assumptions.

02

Existing bounds based on uniform convergence do not explain the good generalization of over-parameterized DNNs.

03

The work provides theoretical insights into why over-parameterized deep networks generalize well.

Abstract

Empirical studies show that gradient-based methods can learn deep neural networks (DNNs) with very good generalization performance in the over-parameterization regime, where DNNs can easily fit a random labeling of the training data. Very recently, a line of work explains in theory that with over-parameterization and proper random initialization, gradient-based methods can find the global minima of the training loss for DNNs. However, existing generalization error bounds are unable to explain the good generalization performance of over-parameterized DNNs. The major limitation of most existing generalization bounds is that they are based on uniform convergence and are independent of the training algorithm. In this work, we derive an algorithm-dependent generalization error bound for deep ReLU networks, and show that under certain assumptions on the data distribution, gradient descent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia?