Escaping Saddle Points with Stochastically Controlled Stochastic   Gradient Methods

Guannan Liang; Qianqian Tong; Chunjiang Zhu; Jinbo Bi

arXiv:2103.04413·math.OC·April 26, 2021·1 cites

Escaping Saddle Points with Stochastically Controlled Stochastic Gradient Methods

Guannan Liang, Qianqian Tong, Chunjiang Zhu, Jinbo Bi

PDF

Open Access

TL;DR

This paper introduces CNC-SCSG, a method combining stochastic gradient steps with SCSG to efficiently escape saddle points in nonconvex optimization, achieving faster convergence to second-order stationary points.

Contribution

The paper proposes CNC-SCSG, a novel algorithm that uses a separate SGD step to escape saddle points, with proven convergence rates independent of problem dimension.

Findings

01

CNC-SCSG converges faster than CNC-SGD.

02

The method escapes saddle points with fewer epochs than perturbed gradient descent.

03

Convergence rate is $ ilde{O}( ext{epsilon}^{-2} ext{log}(1/ ext{epsilon}))$, independent of problem dimension.

Abstract

Stochastically controlled stochastic gradient (SCSG) methods have been proved to converge efficiently to first-order stationary points which, however, can be saddle points in nonconvex optimization. It has been observed that a stochastic gradient descent (SGD) step introduces anistropic noise around saddle points for deep learning and non-convex half space learning problems, which indicates that SGD satisfies the correlated negative curvature (CNC) condition for these problems. Therefore, we propose to use a separate SGD step to help the SCSG method escape from strict saddle points, resulting in the CNC-SCSG method. The SGD step plays a role similar to noise injection but is more stable. We prove that the resultant algorithm converges to a second-order stationary point with a convergence rate of $\tilde{O} (ϵ^{- 2} l o g (1/ ϵ))$ where $ϵ$ is the pre-specified error…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods

MethodsStochastic Gradient Descent