Tuning-Free Stochastic Optimization

Ahmed Khaled; Chi Jin

arXiv:2402.07793·math.OC·March 20, 2024·1 cites

Tuning-Free Stochastic Optimization

Ahmed Khaled, Chi Jin

PDF

Open Access

TL;DR

This paper introduces the concept of tuning-free stochastic optimization algorithms that can adaptively match the performance of optimally-tuned methods across various settings, addressing the high cost of hyperparameter tuning in large-scale machine learning.

Contribution

It formalizes tuning-free algorithms, proves their effectiveness in bounded domains, discusses conditions for unbounded domains, and introduces variants that match tuned SGD performance with minimal additional cost.

Findings

01

Tuning-free algorithms can match optimally-tuned SGD in bounded domains.

02

Tuning-free optimization is impossible over unbounded domains without additional assumptions.

03

Certain algorithms like DoG and DoWG are tuning-free under specific noise conditions.

Abstract

Large-scale machine learning problems make the cost of hyperparameter tuning ever more prohibitive. This creates a need for algorithms that can tune themselves on-the-fly. We formalize the notion of "tuning-free" algorithms that can match the performance of optimally-tuned optimization algorithms up to polylogarithmic factors given only loose hints on the relevant problem parameters. We consider in particular algorithms that can match optimally-tuned Stochastic Gradient Descent (SGD). When the domain of optimization is bounded, we show tuning-free matching of SGD is possible and achieved by several existing algorithms. We prove that for the task of minimizing a convex and smooth or Lipschitz function over an unbounded domain, tuning-free optimization is impossible. We discuss conditions under which tuning-free optimization is possible even over unbounded domains. In particular, we show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications

MethodsStochastic Gradient Descent