Making SGD Parameter-Free

Yair Carmon; Oliver Hinder

arXiv:2205.02160·math.OC·March 4, 2024

Making SGD Parameter-Free

Yair Carmon, Oliver Hinder

PDF

Open Access

TL;DR

This paper introduces a simple, high-probability, parameter-free stochastic convex optimization algorithm that nearly matches the optimal convergence rate, adapting to unknown problem parameters without excess logarithmic factors.

Contribution

It presents a novel parameter-free certificate for SGD step size selection and a time-uniform concentration result, improving convergence guarantees over previous methods.

Findings

01

Achieves near-optimal convergence rate with parameter-free SGD.

02

Provides high-probability guarantees and partial adaptivity to unknown parameters.

03

Introduces a new concentration result assuming no prior bounds on iterates.

Abstract

We develop an algorithm for parameter-free stochastic convex optimization (SCO) whose rate of convergence is only a double-logarithmic factor larger than the optimal rate for the corresponding known-parameter setting. In contrast, the best previously known rates for parameter-free SCO are based on online parameter-free regret bounds, which contain unavoidable excess logarithmic terms compared to their known-parameter counterparts. Our algorithm is conceptually simple, has high-probability guarantees, and is also partially adaptive to unknown gradient norms, smoothness, and strong convexity. At the heart of our results is a novel parameter-free certificate for SGD step size choice, and a time-uniform concentration result that assumes no a-priori bounds on SGD iterates.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques

MethodsStochastic Gradient Descent