Gradient Descent with Provably Tuned Learning-rate Schedules
Dravyansh Sharma

TL;DR
This paper introduces a new analytical framework for tuning hyperparameters in gradient descent, applicable to non-convex, non-smooth functions, including neural networks, with provable guarantees and broad applicability.
Contribution
It develops novel tools for provably tuning hyperparameters in gradient descent for a wide class of functions beyond convex and smooth cases.
Findings
Achieves matching sample complexity bounds for step-size learning in non-convex settings.
Extends to tuning multiple hyperparameters simultaneously, including learning rate schedules and momentum.
Applicable to neural networks with common activation functions like ReLU, sigmoid, and tanh.
Abstract
Gradient-based iterative optimization methods are the workhorse of modern machine learning. They crucially rely on careful tuning of parameters like learning rate and momentum. However, one typically sets them using heuristic approaches without formal near-optimality guarantees. Recent work by Gupta and Roughgarden studies how to learn a good step-size in gradient descent. However, like most of the literature with theoretical guarantees for gradient-based optimization, their results rely on strong assumptions on the function class including convexity and smoothness which do not hold in typical applications. In this work, we develop novel analytical tools for provably tuning hyperparameters in gradient-based algorithms that apply to non-convex and non-smooth functions. We obtain matching sample complexity bounds for learning the step-size in gradient descent shown for smooth, convex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data
