An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning
Blake Woodworth, Nathan Srebro

TL;DR
This paper introduces an optimized stochastic gradient algorithm that leverages minibatching and interpolation learning, achieving optimal dependence on minibatch size and expected loss, and enabling linear parallelization speedup.
Contribution
The paper proposes a new stochastic optimization algorithm that is optimal in terms of minibatch size and expected loss dependence, improving over prior methods.
Findings
Achieves optimal dependence on minibatch size and expected loss.
Enables linear parallelization speedup in interpolation learning.
Outperforms previous algorithms in efficiency and scalability.
Abstract
We present and analyze an algorithm for optimizing smooth and convex or strongly convex objectives using minibatch stochastic gradient estimates. The algorithm is optimal with respect to its dependence on both the minibatch size and minimum expected loss simultaneously. This improves over the optimal method of Lan (2012), which is insensitive to the minimum expected loss; over the optimistic acceleration of Cotter et al. (2011), which has suboptimal dependence on the minibatch size; and over the algorithm of Liu and Belkin (2018), which is limited to least squares problems and is also similarly suboptimal with respect to the minibatch size. Applied to interpolation learning, the improvement over Cotter et al. and Liu and Belkin translates to a linear, rather than square-root, parallelization speedup.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms
