Adaptivity of averaged stochastic gradient descent to local strong   convexity for logistic regression

Francis Bach (INRIA Paris - Rocquencourt; LIENS)

arXiv:1303.6149·math.ST·March 18, 2014·40 cites

Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression

Francis Bach (INRIA Paris - Rocquencourt, LIENS)

PDF

Open Access

TL;DR

This paper demonstrates that averaged stochastic gradient descent adapts to local strong convexity in logistic regression, achieving improved convergence rates without prior knowledge of the local curvature.

Contribution

It proves that averaged stochastic gradient descent automatically adapts to unknown local strong convexity in logistic regression, extending to generalized linear models.

Findings

01

Convergence rate is O(1/√N) with a suitable step-size.

02

Improved convergence rate of O(R^2 / μN) when local strong convexity is present.

03

Method is adaptive and does not require prior knowledge of the Hessian's eigenvalues.

Abstract

In this paper, we consider supervised learning problems such as logistic regression and study the stochastic gradient method with averaging, in the usual stochastic approximation setting where observations are used only once. We show that after $N$ iterations, with a constant step-size proportional to $1/ R^{2} N$ where $N$ is the number of observations and $R$ is the maximum norm of the observations, the convergence rate is always of order $O (1/ N)$ , and improves to $O (R^{2} / μ N)$ where $μ$ is the lowest eigenvalue of the Hessian at the global optimum (when this eigenvalue is greater than $R^{2} / N$ ). Since $μ$ does not need to be known in advance, this shows that averaged stochastic gradient is adaptive to \emph{unknown local} strong convexity of the objective function. Our proof relies on the generalized self-concordance properties of the logistic loss and thus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Statistical Methods and Inference