General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization
Kwangjun Ahn, Gagik Magakyan, Ashok Cutkosky

TL;DR
This paper introduces a general framework for converting online learning algorithms into nonconvex optimization methods, demonstrating that schedule-free SGD is optimal for nonsmooth, nonconvex problems and providing new insights into its parameter choices.
Contribution
It develops a unified framework for online-to-nonconvex conversion, recovering existing methods and creating new schemes, including the schedule-free SGD, with proven optimality.
Findings
Schedule-free SGD achieves optimal iteration complexity.
The framework recovers and extends existing online-to-nonconvex conversions.
Provides theoretical insights into parameter choices for schedule-free SGD.
Abstract
This work investigates the effectiveness of schedule-free methods, developed by A. Defazio et al. (NeurIPS 2024), in nonconvex optimization settings, inspired by their remarkable empirical success in training neural networks. Specifically, we show that schedule-free SGD achieves optimal iteration complexity for nonsmooth, nonconvex optimization problems. Our proof begins with the development of a general framework for online-to-nonconvex conversion, which converts a given online learning algorithm into an optimization algorithm for nonconvex losses. Our general framework not only recovers existing conversions but also leads to two novel conversion schemes. Notably, one of these new conversions corresponds directly to schedule-free SGD, allowing us to establish its optimality. Additionally, our analysis provides valuable insights into the parameter choices for schedule-free SGD,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques
MethodsStochastic Gradient Descent
