Variance-Reduced and Projection-Free Stochastic Optimization

Elad Hazan; Haipeng Luo

arXiv:1602.02101·cs.LG·September 15, 2017·79 cites

Variance-Reduced and Projection-Free Stochastic Optimization

Elad Hazan, Haipeng Luo

PDF

Open Access

TL;DR

This paper introduces variance-reduced, projection-free stochastic optimization algorithms based on Frank-Wolfe, significantly reducing the number of gradient evaluations needed for high-accuracy solutions in machine learning tasks.

Contribution

It proposes two novel stochastic Frank-Wolfe variants utilizing variance reduction, achieving improved theoretical convergence rates over previous methods.

Findings

01

Reduced gradient evaluations from O(1/ε) to O(ln(1/ε)) for strongly convex functions.

02

Reduced evaluations from O(1/ε²) to O(1/ε^{1.5}) for Lipschitz functions.

03

Experimental validation on real datasets confirms theoretical improvements.

Abstract

The Frank-Wolfe optimization algorithm has recently regained popularity for machine learning applications due to its projection-free property and its ability to handle structured constraints. However, in the stochastic learning setting, it is still relatively understudied compared to the gradient descent counterpart. In this work, leveraging a recent variance reduction technique, we propose two stochastic Frank-Wolfe variants which substantially improve previous results in terms of the number of stochastic gradient evaluations needed to achieve $1 - ϵ$ accuracy. For example, we improve from $O (\frac{1}{ϵ})$ to $O (ln \frac{1}{ϵ})$ if the objective function is smooth and strongly convex, and from $O (\frac{1}{ϵ ^{2}})$ to $O (\frac{1}{ϵ ^{1.5}})$ if the objective function is smooth and Lipschitz. The theoretical improvement is also observed in experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms