An Improved Analysis of Training Over-parameterized Deep Neural Networks

Difan Zou; Quanquan Gu

arXiv:1906.04688·cs.LG·June 12, 2019·47 cites

An Improved Analysis of Training Over-parameterized Deep Neural Networks

Difan Zou, Quanquan Gu

PDF

Open Access

TL;DR

This paper improves the theoretical understanding of training over-parameterized deep neural networks, showing convergence under milder width conditions than previously established, thus advancing the theoretical foundations of deep learning.

Contribution

It provides a tighter analysis of gradient descent convergence requiring less over-parameterization, especially for two-layer neural networks.

Findings

01

Convergence achieved with milder over-parameterization conditions.

02

Tighter gradient lower bounds lead to faster convergence.

03

Sharper trajectory length characterization improves theoretical bounds.

Abstract

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the condition on the width of the neural network to ensure the global convergence is very stringent, which is often a high-degree polynomial in the training sample size $n$ (e.g., $O (n^{24})$ ). In this paper, we provide an improved analysis of the global convergence of (stochastic) gradient descent for training deep neural networks, which only requires a milder over-parameterization condition than previous work in terms of the training sample size and other problem-dependent parameters. The main technical contributions of our analysis include (a) a tighter gradient lower bound that leads to a faster convergence of the algorithm, and (b) a sharper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Neural Network Applications