Using Taylor-Approximated Gradients to Improve the Frank-Wolfe Method for Empirical Risk Minimization
Zikai Xiong, Robert M. Freund

TL;DR
This paper introduces Taylor-approximated gradient methods to enhance the Frank-Wolfe algorithm, significantly reducing computational dependence on data size while maintaining optimal convergence in empirical risk minimization tasks.
Contribution
It proposes a novel Taylor series-based gradient approximation for Frank-Wolfe, applicable in both deterministic and stochastic settings, with adaptive step-size and proven computational guarantees.
Findings
Significant speed-ups over existing methods on real datasets
Reduced dependence on data size in large-scale problems
Achieved optimal convergence rates in convex and non-convex settings
Abstract
The Frank-Wolfe method has become increasingly useful in statistical and machine learning applications, due to the structure-inducing properties of the iterates, and especially in settings where linear minimization over the feasible set is more computationally efficient than projection. In the setting of Empirical Risk Minimization -- one of the fundamental optimization problems in statistical and machine learning -- the computational effectiveness of Frank-Wolfe methods typically grows linearly in the number of data observations . This is in stark contrast to the case for typical stochastic projection methods. In order to reduce this dependence on , we look to second-order smoothness of typical smooth loss functions (least squares loss and logistic loss, for example) and we propose amending the Frank-Wolfe method with Taylor series-approximated gradients, including variants for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Statistical Methods and Inference
