Provable Super-Convergence with a Large Cyclical Learning Rate
Samet Oymak

TL;DR
This paper demonstrates that a cyclical learning rate scheme with large unstable steps can achieve super-convergence, especially in problems with bimodal Hessian spectra, challenging traditional stable learning rate assumptions.
Contribution
It introduces a novel cyclical learning rate method that leverages instability for faster convergence, providing theoretical analysis and explaining empirical super-convergence phenomena.
Findings
Large cyclical learning rates enable super-convergence in certain problems.
The scheme's convergence depends logarithmically on the condition number.
It is particularly effective when the Hessian has a bimodal spectrum.
Abstract
Conventional wisdom dictates that learning rate should be in the stable regime so that gradient-based algorithms don't blow up. This letter introduces a simple scenario where an unstably large learning rate scheme leads to a super fast convergence, with the convergence rate depending only logarithmically on the condition number of the problem. Our scheme uses a Cyclical Learning Rate (CLR) where we periodically take one large unstable step and several small stable steps to compensate for the instability. These findings also help explain the empirical observations of [Smith and Topin, 2019] where they show that CLR with a large maximum learning rate can dramatically accelerate learning and lead to so-called "super-convergence". We prove that our scheme excels in the problems where Hessian exhibits a bimodal spectrum and the eigenvalues can be grouped into two clusters (small and large).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
