Stop Wasting My Gradients: Practical SVRG
Reza Babanezhad, Mohamed Osama Ahmed, Alim Virani, Mark Schmidt, Jakub, Kone\v{c}n\'y, Scott Sallinen

TL;DR
This paper introduces practical improvements to the SVRG optimization method, including strategies for reducing gradient computations and enhancing convergence, making it more efficient for large-scale machine learning tasks.
Contribution
The paper proposes new variants of SVRG that incorporate decreasing error sequences, support vector exploitation, and alternative mini-batch strategies to improve efficiency and convergence.
Findings
Growing-batch strategies reduce early iteration gradient calculations.
Support vector exploitation decreases later iteration computations.
Regularized SVRG improves convergence rate.
Abstract
We present and analyze several strategies for improving the performance of stochastic variance-reduced gradient (SVRG) methods. We first show that the convergence rate of these methods can be preserved under a decreasing sequence of errors in the control variate, and use this to derive variants of SVRG that use growing-batch strategies to reduce the number of gradient calculations required in the early iterations. We further (i) show how to exploit support vectors to reduce the number of gradient computations in the later iterations, (ii) prove that the commonly-used regularized SVRG iteration is justified and improves the convergence rate, (iii) consider alternate mini-batch selection strategies, and (iv) consider the generalization error of the method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Privacy-Preserving Technologies in Data
