Provable Convergence of Nesterov's Accelerated Gradient Method for Over-Parameterized Neural Networks
Xin Liu, Zhisong Pan, Wei Tao

TL;DR
This paper proves that Nesterov's accelerated gradient method converges linearly to a global minimum in training over-parameterized neural networks, explaining its empirical success and showing it accelerates over standard gradient descent.
Contribution
The paper provides the first theoretical convergence guarantee for Nesterov's method in training over-parameterized neural networks, demonstrating its accelerated linear convergence rate.
Findings
NAG converges at a rate of (1 - Θ(1/√κ))^t
NAG's convergence rate is comparable to Heavy Ball method
Experimental results validate theoretical convergence guarantees
Abstract
Momentum methods, such as heavy ball method~(HB) and Nesterov's accelerated gradient method~(NAG), have been widely used in training neural networks by incorporating the history of gradients into the current updating process. In practice, they often provide improved performance over (stochastic) gradient descent~(GD) with faster convergence. Despite these empirical successes, theoretical understandings of their accelerated convergence rates are still lacking. Recently, some attempts have been made by analyzing the trajectories of gradient-based methods in an over-parameterized regime, where the number of the parameters is significantly larger than the number of the training instances. However, the majority of existing theoretical work is mainly concerned with GD and the established convergence result of NAG is inferior to HB and GD, which fails to explain the practical success of NAG.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Neural Network Applications
