An Improved Analysis of Training Over-parameterized Deep Neural Networks
Difan Zou, Quanquan Gu

TL;DR
This paper improves the theoretical understanding of training over-parameterized deep neural networks, showing convergence under milder width conditions than previously established, thus advancing the theoretical foundations of deep learning.
Contribution
It provides a tighter analysis of gradient descent convergence requiring less over-parameterization, especially for two-layer neural networks.
Findings
Convergence achieved with milder over-parameterization conditions.
Tighter gradient lower bounds lead to faster convergence.
Sharper trajectory length characterization improves theoretical bounds.
Abstract
A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the condition on the width of the neural network to ensure the global convergence is very stringent, which is often a high-degree polynomial in the training sample size (e.g., ). In this paper, we provide an improved analysis of the global convergence of (stochastic) gradient descent for training deep neural networks, which only requires a milder over-parameterization condition than previous work in terms of the training sample size and other problem-dependent parameters. The main technical contributions of our analysis include (a) a tighter gradient lower bound that leads to a faster convergence of the algorithm, and (b) a sharper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Neural Network Applications
