Beyond Convexity: Stochastic Quasi-Convex Optimization
Elad Hazan, Kfir Y. Levy, Shai Shalev-Shwartz

TL;DR
This paper extends stochastic gradient methods to quasi-convex, locally-Lipschitz functions, proving convergence of a normalized gradient descent variant that handles saddle points and gradient explosions.
Contribution
It introduces a stochastic normalized gradient descent algorithm with proven convergence for a broader class of functions than convex ones, including quasi-convex and locally-Lipschitz functions.
Findings
Convergence to global minimum for quasi-convex functions.
Normalized gradient descent requires a minimal minibatch size.
Handles saddle points and gradient explosions effectively.
Abstract
Stochastic convex optimization is a basic and well studied primitive in machine learning. It is well known that convex and Lipschitz functions can be minimized efficiently using Stochastic Gradient Descent (SGD). The Normalized Gradient Descent (NGD) algorithm, is an adaptation of Gradient Descent, which updates according to the direction of the gradients, rather than the gradients themselves. In this paper we analyze a stochastic version of NGD and prove its convergence to a global minimum for a wider class of functions: we require the functions to be quasi-convex and locally-Lipschitz. Quasi-convexity broadens the con- cept of unimodality to multidimensions and allows for certain types of saddle points, which are a known hurdle for first-order optimization methods such as gradient descent. Locally-Lipschitz functions are only required to be Lipschitz in a small region around the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Bandit Algorithms Research
MethodsStochastic Gradient Descent
