A Modular Analysis of Provable Acceleration via Polyak's Momentum: Training a Wide ReLU Network and a Deep Linear Network
Jun-Kun Wang, Chi-Heng Lin, Jacob Abernethy

TL;DR
This paper provides theoretical guarantees that Polyak's momentum accelerates convergence in training wide ReLU and deep linear neural networks, improving known rates from classical gradient descent methods.
Contribution
It offers a modular analysis demonstrating that Polyak's momentum achieves provable acceleration in neural network training, with explicit non-asymptotic convergence rates.
Findings
Polyak's momentum achieves a non-asymptotic accelerated linear rate in quadratic problems.
Polyak's momentum accelerates training of wide ReLU networks with rate $(1-rac{1}{ oot{ ext{condition number}}})^t$.
Polyak's momentum accelerates deep linear network training with rate $(1-rac{1}{ oot{ ext{condition number}}})^t$.
Abstract
Incorporating a so-called "momentum" dynamic in gradient descent methods is widely used in neural net training as it has been broadly observed that, at least empirically, it often leads to significantly faster convergence. At the same time, there are very few theoretical guarantees in the literature to explain this apparent acceleration effect. Even for the classical strongly convex quadratic problems, several existing results only show Polyak's momentum has an accelerated linear rate asymptotically. In this paper, we first revisit the quadratic problems and show a non-asymptotic accelerated linear rate of Polyak's momentum. Then, we provably show that Polyak's momentum achieves acceleration for training a one-layer wide ReLU network and a deep linear network, which are perhaps the two most popular canonical models for studying optimization and deep learning in the literature. Prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia?
