Local Optimality and Generalization Guarantees for the Langevin Algorithm via Empirical Metastability
Belinda Tzen, Tengyuan Liang, Maxim Raginsky

TL;DR
This paper analyzes the path-wise behavior of the Langevin algorithm in non-convex ERM, revealing empirical metastability and providing insights into its convergence and escape times, with implications for generalization.
Contribution
It introduces a metastability framework for Langevin dynamics in non-convex optimization, connecting path-wise behavior with generalization guarantees and escape time scaling.
Findings
Langevin trajectories exhibit empirical metastability near local optima.
Escape times scale exponentially, aligning with Eyring-Kramers law.
Langevin algorithm can visit all optima without strong initialization conditions.
Abstract
We study the detailed path-wise behavior of the discrete-time Langevin algorithm for non-convex Empirical Risk Minimization (ERM) through the lens of metastability, adopting some techniques from Berglund and Gentz (2003. For a particular local optimum of the empirical risk, with an arbitrary initialization, we show that, with high probability, at least one of the following two events will occur: (1) the Langevin trajectory ends up somewhere outside the -neighborhood of this particular optimum within a short recurrence time; (2) it enters this -neighborhood by the recurrence time and stays there until a potentially exponentially long escape time. We call this phenomenon empirical metastability. This two-timescale characterization aligns nicely with the existing literature in the following two senses. First, the effective recurrence time (i.e., number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Sparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques
