Batched Stochastic Gradient Descent with Weighted Sampling

Deanna Needell; Rachel Ward

arXiv:1608.07641·math.NA·March 2, 2017

Batched Stochastic Gradient Descent with Weighted Sampling

Deanna Needell, Rachel Ward

PDF

TL;DR

This paper introduces a batched stochastic gradient descent method with weighted sampling that accelerates convergence for various objective functions, supported by theoretical analysis and experimental validation.

Contribution

It combines batching and weighted sampling in SGD, providing new schemes for optimal weights and demonstrating significant speedup over existing methods.

Findings

01

Speedup in convergence rate with batched weighted sampling

02

Efficient schemes for approximating optimal weights

03

Experimental validation showing substantial gains

Abstract

We analyze a batched variant of Stochastic Gradient Descent (SGD) with weighted sampling distribution for smooth and non-smooth objective functions. We show that by distributing the batches computationally, a significant speedup in the convergence rate is provably possible compared to either batched sampling or weighted sampling alone. We propose several computationally efficient schemes to approximate the optimal weights, and compute proposed sampling distributions explicitly for the least squares and hinge loss problems. We show both analytically and experimentally that substantial gains can be obtained.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.