The Lingering of Gradients: Theory and Applications
Zeyuan Allen-Zhu, David Simchi-Levi, Xinshang Wang

TL;DR
This paper introduces a refined analysis of gradient-based methods by considering the lingering effect of gradients, leading to faster convergence rates and improved practical performance in large-scale optimization tasks.
Contribution
It develops a theoretical framework for gradient lingering, demonstrating improved convergence rates and applying it to real-world large-scale problems.
Findings
Gradient descent convergence rate improved from 1/T to exp(-T^{1/3})
Achieved high-accuracy solutions on large-scale datasets with fewer passes
Enhanced SVM performance by two orders of magnitude over existing algorithms
Abstract
Classically, the time complexity of a first-order method is estimated by its number of gradient computations. In this paper, we study a more refined complexity by taking into account the `lingering' of gradients: once a gradient is computed at , the additional time to compute gradients at may be reduced. We show how this improves the running time of several first-order methods. For instance, if the `additional time' scales linearly with respect to the traveled distance, then the `convergence rate' of gradient descent can be improved from to . On the application side, we solve a hypothetical revenue management problem on the Yahoo! Front Page Today Module with 4.6m users to error using only 6 passes of the dataset; and solve a real-life support vector machine problem to an accuracy that is two orders of magnitude better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Search Problems · Stochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research
