A Generic Approach for Escaping Saddle points
Sashank J Reddi, Manzil Zaheer, Suvrit Sra, Barnabas Poczos, Francis, Bach, Ruslan Salakhutdinov, Alexander J Smola

TL;DR
This paper proposes a generic framework that efficiently escapes saddle points in nonconvex optimization by combining first- and second-order methods, reducing Hessian computations while ensuring convergence.
Contribution
The authors introduce a novel framework that minimizes Hessian computations and provably converges to second-order critical points by alternating between first- and second-order subroutines.
Findings
Framework effectively escapes saddle points with reduced Hessian computations.
Convergence results are competitive with existing second-order methods.
Empirical results demonstrate practical efficiency of the proposed approach.
Abstract
A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points. First-order methods often get stuck at saddle points, greatly deteriorating their performance. Typically, to escape from saddles one has to use second-order methods. However, most works on second-order methods rely extensively on expensive Hessian-based computations, making them impractical in large-scale settings. To tackle this challenge, we introduce a generic framework that minimizes Hessian based computations while at the same time provably converging to second-order critical points. Our framework carefully alternates between a first-order and a second-order subroutine, using the latter only close to saddle points, and yields convergence results competitive to the state-of-the-art. Empirical results suggest that our strategy also enjoys a good practical performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Optimization Algorithms Research
