ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single   Algorithm

Chris Junchi Li; Wenlong Mou; Martin J. Wainwright; Michael I. Jordan

arXiv:2008.12690·math.OC·September 19, 2024

ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm

Chris Junchi Li, Wenlong Mou, Martin J. Wainwright, Michael I. Jordan

PDF

Open Access

TL;DR

ROOT-SGD is a new stochastic optimization algorithm that achieves both sharp finite-sample bounds and near-optimal asymptotic convergence, effectively combining nonasymptotic and asymptotic analysis.

Contribution

The paper introduces ROOT-SGD, a recursive averaging method that attains state-of-the-art finite-sample risk bounds and asymptotic normality with optimal covariance in strongly convex optimization.

Findings

01

Achieves optimal statistical risk bounds with a sharp rate of O(n^{-3/2})

02

Converges asymptotically to a Gaussian distribution with Cramér-Rao optimal covariance

03

Performs well both in finite-sample and asymptotic regimes

Abstract

We study the problem of solving strongly convex and smooth unconstrained optimization problems using stochastic first-order algorithms. We devise a novel algorithm, referred to as Recursive One-Over-T SGD (ROOT-SGD), based on an easily implementable, recursive averaging of past stochastic gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an asymptotic sense. On the non-asymptotic side, we prove risk bounds on the last iterate of ROOT-SGD with leading-order terms that match the optimal statistical risk with a unity pre-factor, along with a higher-order term that scales at the sharp rate of $O (n^{- 3/2})$ under the Lipschitz condition on the Hessian matrix. On the asymptotic side, we show that when a mild, one-point Hessian continuity condition is imposed, the rescaled last iterate of (multi-epoch) ROOT-SGD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms

MethodsStochastic Gradient Descent