Variance-Reduced and Projection-Free Stochastic Optimization
Elad Hazan, Haipeng Luo

TL;DR
This paper introduces variance-reduced, projection-free stochastic optimization algorithms based on Frank-Wolfe, significantly reducing the number of gradient evaluations needed for high-accuracy solutions in machine learning tasks.
Contribution
It proposes two novel stochastic Frank-Wolfe variants utilizing variance reduction, achieving improved theoretical convergence rates over previous methods.
Findings
Reduced gradient evaluations from O(1/ε) to O(ln(1/ε)) for strongly convex functions.
Reduced evaluations from O(1/ε²) to O(1/ε^{1.5}) for Lipschitz functions.
Experimental validation on real datasets confirms theoretical improvements.
Abstract
The Frank-Wolfe optimization algorithm has recently regained popularity for machine learning applications due to its projection-free property and its ability to handle structured constraints. However, in the stochastic learning setting, it is still relatively understudied compared to the gradient descent counterpart. In this work, leveraging a recent variance reduction technique, we propose two stochastic Frank-Wolfe variants which substantially improve previous results in terms of the number of stochastic gradient evaluations needed to achieve accuracy. For example, we improve from to if the objective function is smooth and strongly convex, and from to if the objective function is smooth and Lipschitz. The theoretical improvement is also observed in experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms
