Without-Replacement Sampling for Stochastic Gradient Methods:   Convergence Results and Application to Distributed Optimization

Ohad Shamir

arXiv:1603.00570·cs.LG·October 18, 2016·19 cites

Without-Replacement Sampling for Stochastic Gradient Methods: Convergence Results and Application to Distributed Optimization

Ohad Shamir

PDF

Open Access

TL;DR

This paper analyzes the convergence of stochastic gradient methods using without-replacement sampling, providing guarantees and applications to distributed optimization, which is more practical and often more effective than traditional with-replacement sampling.

Contribution

It offers the first competitive convergence guarantees for without-replacement sampling across multiple algorithms and introduces a nearly-optimal distributed regularized least squares algorithm.

Findings

01

Provides convergence guarantees for without-replacement sampling

02

Develops a distributed algorithm for regularized least squares

03

Achieves near-optimal communication and runtime complexity

Abstract

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled \emph{with} replacement. In practice, however, sampling \emph{without} replacement is very common, easier to implement in many cases, and often performs better. In this paper, we provide competitive convergence guarantees for without-replacement sampling, under various scenarios, for three types of algorithms: Any algorithm with online regret guarantees, stochastic gradient descent, and SVRG. A useful application of our SVRG analysis is a nearly-optimal algorithm for regularized least squares in a distributed setting, in terms of both communication complexity and runtime complexity, when the data is randomly partitioned and the condition number can be as large as the data size per machine (up to logarithmic factors). Our proof techniques combine ideas from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Sparse and Compressive Sensing Techniques