Provable Super-Convergence with a Large Cyclical Learning Rate

Samet Oymak

arXiv:2102.10734·cs.LG·September 8, 2021

Provable Super-Convergence with a Large Cyclical Learning Rate

Samet Oymak

PDF

TL;DR

This paper demonstrates that a cyclical learning rate scheme with large unstable steps can achieve super-convergence, especially in problems with bimodal Hessian spectra, challenging traditional stable learning rate assumptions.

Contribution

It introduces a novel cyclical learning rate method that leverages instability for faster convergence, providing theoretical analysis and explaining empirical super-convergence phenomena.

Findings

01

Large cyclical learning rates enable super-convergence in certain problems.

02

The scheme's convergence depends logarithmically on the condition number.

03

It is particularly effective when the Hessian has a bimodal spectrum.

Abstract

Conventional wisdom dictates that learning rate should be in the stable regime so that gradient-based algorithms don't blow up. This letter introduces a simple scenario where an unstably large learning rate scheme leads to a super fast convergence, with the convergence rate depending only logarithmically on the condition number of the problem. Our scheme uses a Cyclical Learning Rate (CLR) where we periodically take one large unstable step and several small stable steps to compensate for the instability. These findings also help explain the empirical observations of [Smith and Topin, 2019] where they show that CLR with a large maximum learning rate can dramatically accelerate learning and lead to so-called "super-convergence". We prove that our scheme excels in the problems where Hessian exhibits a bimodal spectrum and the eigenvalues can be grouped into two clusters (small and large).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.