A Modular Analysis of Provable Acceleration via Polyak's Momentum:   Training a Wide ReLU Network and a Deep Linear Network

Jun-Kun Wang; Chi-Heng Lin; Jacob Abernethy

arXiv:2010.01618·cs.LG·June 14, 2021

A Modular Analysis of Provable Acceleration via Polyak's Momentum: Training a Wide ReLU Network and a Deep Linear Network

Jun-Kun Wang, Chi-Heng Lin, Jacob Abernethy

PDF

Open Access 1 Video

TL;DR

This paper provides theoretical guarantees that Polyak's momentum accelerates convergence in training wide ReLU and deep linear neural networks, improving known rates from classical gradient descent methods.

Contribution

It offers a modular analysis demonstrating that Polyak's momentum achieves provable acceleration in neural network training, with explicit non-asymptotic convergence rates.

Findings

01

Polyak's momentum achieves a non-asymptotic accelerated linear rate in quadratic problems.

02

Polyak's momentum accelerates training of wide ReLU networks with rate $(1-rac{1}{ oot{ ext{condition number}}})^t$.

03

Polyak's momentum accelerates deep linear network training with rate $(1-rac{1}{ oot{ ext{condition number}}})^t$.

Abstract

Incorporating a so-called "momentum" dynamic in gradient descent methods is widely used in neural net training as it has been broadly observed that, at least empirically, it often leads to significantly faster convergence. At the same time, there are very few theoretical guarantees in the literature to explain this apparent acceleration effect. Even for the classical strongly convex quadratic problems, several existing results only show Polyak's momentum has an accelerated linear rate asymptotically. In this paper, we first revisit the quadratic problems and show a non-asymptotic accelerated linear rate of Polyak's momentum. Then, we provably show that Polyak's momentum achieves acceleration for training a one-layer wide ReLU network and a deep linear network, which are perhaps the two most popular canonical models for studying optimization and deep learning in the literature. Prior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Modular Analysis of Provable Acceleration via Polyak's Momentum: Training a Wide ReLU Network and a Deep Linear Network· slideslive

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications

Methods*Communicated@Fast*How Do I Communicate to Expedia?