Universal halting times in optimization and machine learning
Levent Sagun, Thomas Trogdon, Yann LeCun

TL;DR
This paper investigates the distribution of halting times in optimization algorithms across different random systems, revealing universal patterns that are independent of the specific landscape distribution, with implications for understanding convergence behavior.
Contribution
It identifies universal distribution classes for halting times in optimization, demonstrating their independence from the underlying landscape distribution across various systems.
Findings
Halting times follow Gumbel-like or Gaussian-like distributions.
Distributions are invariant under changes in landscape distribution.
Universal behavior observed across diverse optimization scenarios.
Abstract
The authors present empirical distributions for the halting time (measured by the number of iterations to reach a given accuracy) of optimization algorithms applied to two random systems: spin glasses and deep learning. Given an algorithm, which we take to be both the optimization routine and the form of the random landscape, the fluctuations of the halting time follow a distribution that, after centering and scaling, remains unchanged even when the distribution on the landscape is changed. We observe two qualitative classes: A Gumbel-like distribution that appears in Google searches, human decision times, the QR eigenvalue algorithm and spin glasses, and a Gaussian-like distribution that appears in conjugate gradient method, deep network with MNIST input data and deep network with random input data. This empirical evidence suggests presence of a class of distributions for which the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
