Beyond Convexity: Stochastic Quasi-Convex Optimization

Elad Hazan; Kfir Y. Levy; Shai Shalev-Shwartz

arXiv:1507.02030·cs.LG·October 29, 2015·50 cites

Beyond Convexity: Stochastic Quasi-Convex Optimization

Elad Hazan, Kfir Y. Levy, Shai Shalev-Shwartz

PDF

Open Access

TL;DR

This paper extends stochastic gradient methods to quasi-convex, locally-Lipschitz functions, proving convergence of a normalized gradient descent variant that handles saddle points and gradient explosions.

Contribution

It introduces a stochastic normalized gradient descent algorithm with proven convergence for a broader class of functions than convex ones, including quasi-convex and locally-Lipschitz functions.

Findings

01

Convergence to global minimum for quasi-convex functions.

02

Normalized gradient descent requires a minimal minibatch size.

03

Handles saddle points and gradient explosions effectively.

Abstract

Stochastic convex optimization is a basic and well studied primitive in machine learning. It is well known that convex and Lipschitz functions can be minimized efficiently using Stochastic Gradient Descent (SGD). The Normalized Gradient Descent (NGD) algorithm, is an adaptation of Gradient Descent, which updates according to the direction of the gradients, rather than the gradients themselves. In this paper we analyze a stochastic version of NGD and prove its convergence to a global minimum for a wider class of functions: we require the functions to be quasi-convex and locally-Lipschitz. Quasi-convexity broadens the con- cept of unimodality to multidimensions and allows for certain types of saddle points, which are a known hurdle for first-order optimization methods such as gradient descent. Locally-Lipschitz functions are only required to be Lipschitz in a small region around the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Bandit Algorithms Research

MethodsStochastic Gradient Descent