
TL;DR
This paper investigates the limitations of the Frank-Wolfe algorithm in high-dimensional settings, showing that it requires many iterations for certain domains, and links these bounds to metric entropy with implications for statistical algorithms.
Contribution
It introduces a technique to establish lower bounds for Frank-Wolfe based on metric entropy and demonstrates that linear convergence cannot be achieved in average-case high-dimensional scenarios.
Findings
Frank-Wolfe requires up to d iterations for certain random polytopes.
Dimension-free linear bounds fail in average-case high dimensions.
The results have positive implications for statistical algorithms like gradient boosting.
Abstract
The Frank-Wolfe algorithm has seen a resurgence in popularity due to its ability to efficiently solve constrained optimization problems in machine learning and high-dimensional statistics. As such, there is much interest in establishing when the algorithm may possess a "linear" dimension-free iteration complexity comparable to projected gradient descent. In this paper, we provide a general technique for establishing domain specific and easy-to-estimate lower bounds for Frank-Wolfe and its variants using the metric entropy of the domain. Most notably, we show that a dimension-free linear upper bound must fail not only in the worst case, but in the \emph{average case}: for a Gaussian or spherical random polytope in with vertices, Frank-Wolfe requires up to iterations to achieve a error bound, with high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning
