A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics
Yuchen Zhang, Percy Liang, Moses Charikar

TL;DR
This paper analyzes the efficiency of Stochastic Gradient Langevin Dynamics (SGLD) in reaching local minima and improving learning results, especially in non-convex optimization and classification tasks, through a hitting time framework.
Contribution
It introduces a hitting time analysis for SGLD, showing polynomial-time convergence to local minima and improved learnability for linear classifiers.
Findings
SGLD can reach approximate local minima efficiently in non-convex settings.
The analysis demonstrates polynomial-time escape from suboptimal local minima.
SGLD improves existing results in learning linear classifiers under zero-one loss.
Abstract
We study the Stochastic Gradient Langevin Dynamics (SGLD) algorithm for non-convex optimization. The algorithm performs stochastic gradient descent, where in each step it injects appropriately scaled Gaussian noise to the update. We analyze the algorithm's hitting time to an arbitrary subset of the parameter space. Two results follow from our general theory: First, we prove that for empirical risk minimization, if the empirical risk is point-wise close to the (smooth) population risk, then the algorithm achieves an approximate local minimum of the population risk in polynomial time, escaping suboptimal local minima that only exist in the empirical risk. Second, we show that SGLD improves on one of the best known learnability results for learning linear classifiers under the zero-one loss.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Machine Learning and Algorithms
