Non-strongly-convex smooth stochastic approximation with convergence   rate O(1/n)

Francis Bach (INRIA Paris - Rocquencourt; LIENS); Eric Moulines (LTCI)

arXiv:1306.2119·cs.LG·June 11, 2013·229 cites

Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)

Francis Bach (INRIA Paris - Rocquencourt, LIENS), Eric Moulines (LTCI)

PDF

Open Access

TL;DR

This paper introduces two stochastic approximation algorithms that achieve an optimal convergence rate of O(1/n) for non-strongly convex smooth functions, improving over the standard O(1/√n) rate.

Contribution

The authors propose and analyze two algorithms that attain an O(1/n) convergence rate for convex, smooth, non-strongly convex problems, including novel methods for logistic regression.

Findings

01

Averaged stochastic gradient descent with constant step-size achieves O(1/n) for least-squares regression.

02

A new stochastic gradient algorithm constructs local quadratic approximations for logistic regression.

03

Extensive experiments show the proposed algorithms often outperform existing methods.

Abstract

We consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which includes machine learning methods based on the minimization of the empirical risk. We focus on problems without strong convexity, for which all previously known algorithms achieve a convergence rate for function values of O(1/n^{1/2}). We consider and analyze two algorithms that achieve a rate of O(1/n) for classical supervised learning problems. For least-squares regression, we show that averaged stochastic gradient descent with constant step-size achieves the desired rate. For logistic regression, this is achieved by a simple novel stochastic gradient algorithm that (a) constructs successive local quadratic approximations of the loss functions, while (b) preserving the same running time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms