Accelerating Minibatch Stochastic Gradient Descent using Stratified   Sampling

Peilin Zhao; Tong Zhang

arXiv:1405.3080·stat.ML·May 14, 2014·83 cites

Accelerating Minibatch Stochastic Gradient Descent using Stratified Sampling

Peilin Zhao, Tong Zhang

PDF

Open Access

TL;DR

This paper introduces a stratified sampling method for minibatch SGD that reduces variance and accelerates convergence by dividing data into low-variance clusters, with experimental results confirming its effectiveness.

Contribution

The paper proposes a novel stratified sampling strategy for minibatch SGD that improves convergence rates over uniform sampling.

Findings

01

Convergence rate is significantly improved with stratified sampling.

02

Experimental results demonstrate faster training times.

03

Variance reduction leads to more stable updates.

Abstract

Stochastic Gradient Descent (SGD) is a popular optimization method which has been applied to many important machine learning tasks such as Support Vector Machines and Deep Neural Networks. In order to parallelize SGD, minibatch training is often employed. The standard approach is to uniformly sample a minibatch at each step, which often leads to high variance. In this paper we propose a stratified sampling strategy, which divides the whole dataset into clusters with low within-cluster variance; we then take examples from these clusters using a stratified sampling technique. It is shown that the convergence rate can be significantly improved by the algorithm. Encouraging experimental results confirm the effectiveness of the proposed method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Bandit Algorithms Research

MethodsStochastic Gradient Descent