Tuning-Free Stochastic Optimization
Ahmed Khaled, Chi Jin

TL;DR
This paper introduces the concept of tuning-free stochastic optimization algorithms that can adaptively match the performance of optimally-tuned methods across various settings, addressing the high cost of hyperparameter tuning in large-scale machine learning.
Contribution
It formalizes tuning-free algorithms, proves their effectiveness in bounded domains, discusses conditions for unbounded domains, and introduces variants that match tuned SGD performance with minimal additional cost.
Findings
Tuning-free algorithms can match optimally-tuned SGD in bounded domains.
Tuning-free optimization is impossible over unbounded domains without additional assumptions.
Certain algorithms like DoG and DoWG are tuning-free under specific noise conditions.
Abstract
Large-scale machine learning problems make the cost of hyperparameter tuning ever more prohibitive. This creates a need for algorithms that can tune themselves on-the-fly. We formalize the notion of "tuning-free" algorithms that can match the performance of optimally-tuned optimization algorithms up to polylogarithmic factors given only loose hints on the relevant problem parameters. We consider in particular algorithms that can match optimally-tuned Stochastic Gradient Descent (SGD). When the domain of optimization is bounded, we show tuning-free matching of SGD is possible and achieved by several existing algorithms. We prove that for the task of minimizing a convex and smooth or Lipschitz function over an unbounded domain, tuning-free optimization is impossible. We discuss conditions under which tuning-free optimization is possible even over unbounded domains. In particular, we show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications
MethodsStochastic Gradient Descent
