Escaping Saddle Points with Stochastically Controlled Stochastic Gradient Methods
Guannan Liang, Qianqian Tong, Chunjiang Zhu, Jinbo Bi

TL;DR
This paper introduces CNC-SCSG, a method combining stochastic gradient steps with SCSG to efficiently escape saddle points in nonconvex optimization, achieving faster convergence to second-order stationary points.
Contribution
The paper proposes CNC-SCSG, a novel algorithm that uses a separate SGD step to escape saddle points, with proven convergence rates independent of problem dimension.
Findings
CNC-SCSG converges faster than CNC-SGD.
The method escapes saddle points with fewer epochs than perturbed gradient descent.
Convergence rate is $ ilde{O}( ext{epsilon}^{-2} ext{log}(1/ ext{epsilon}))$, independent of problem dimension.
Abstract
Stochastically controlled stochastic gradient (SCSG) methods have been proved to converge efficiently to first-order stationary points which, however, can be saddle points in nonconvex optimization. It has been observed that a stochastic gradient descent (SGD) step introduces anistropic noise around saddle points for deep learning and non-convex half space learning problems, which indicates that SGD satisfies the correlated negative curvature (CNC) condition for these problems. Therefore, we propose to use a separate SGD step to help the SCSG method escape from strict saddle points, resulting in the CNC-SCSG method. The SGD step plays a role similar to noise injection but is more stable. We prove that the resultant algorithm converges to a second-order stationary point with a convergence rate of where is the pre-specified error…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
MethodsStochastic Gradient Descent
