Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting
Jakub Kone\v{c}n\'y, Jie Liu, Peter Richt\'arik, Martin Tak\'a\v{c}

TL;DR
This paper introduces mS2GD, a mini-batch semi-stochastic gradient descent method that improves theoretical complexity and practical performance for strongly convex optimization problems with large datasets, enabling parallelization.
Contribution
The paper presents a novel mini-batching scheme for S2GD, enhancing efficiency and parallelizability in convex optimization with theoretical analysis.
Findings
Mini-batching reduces overall computational work for fixed accuracy.
The method benefits from two speedup effects: reduced work and parallel implementation.
Suitable for large-scale convex optimization with parallel computing.
Abstract
We propose mS2GD: a method incorporating a mini-batching scheme for improving the theoretical complexity and practical performance of semi-stochastic gradient descent (S2GD). We consider the problem of minimizing a strongly convex function represented as the sum of an average of a large number of smooth convex functions, and a simple nonsmooth convex regularizer. Our method first performs a deterministic step (computation of the gradient of the objective function at the starting point), followed by a large number of stochastic steps. The process is repeated a few times with the last iterate becoming the new starting point. The novelty of our method is in introduction of mini-batching into the computation of stochastic steps. In each step, instead of choosing a single function, we sample functions, compute their gradients, and compute the direction based on this. We analyze the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
