Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting

Jakub Kone\v{c}n\'y; Jie Liu; Peter Richt\'arik; Martin Tak\'a\v{c}

arXiv:1504.04407·cs.LG·April 20, 2016

Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting

Jakub Kone\v{c}n\'y, Jie Liu, Peter Richt\'arik, Martin Tak\'a\v{c}

PDF

TL;DR

This paper introduces mS2GD, a mini-batch semi-stochastic gradient descent method that improves theoretical complexity and practical performance for strongly convex optimization problems with large datasets, enabling parallelization.

Contribution

The paper presents a novel mini-batching scheme for S2GD, enhancing efficiency and parallelizability in convex optimization with theoretical analysis.

Findings

01

Mini-batching reduces overall computational work for fixed accuracy.

02

The method benefits from two speedup effects: reduced work and parallel implementation.

03

Suitable for large-scale convex optimization with parallel computing.

Abstract

We propose mS2GD: a method incorporating a mini-batching scheme for improving the theoretical complexity and practical performance of semi-stochastic gradient descent (S2GD). We consider the problem of minimizing a strongly convex function represented as the sum of an average of a large number of smooth convex functions, and a simple nonsmooth convex regularizer. Our method first performs a deterministic step (computation of the gradient of the objective function at the starting point), followed by a large number of stochastic steps. The process is repeated a few times with the last iterate becoming the new starting point. The novelty of our method is in introduction of mini-batching into the computation of stochastic steps. In each step, instead of choosing a single function, we sample $b$ functions, compute their gradients, and compute the direction based on this. We analyze the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.