Variance Reduction in SGD by Distributed Importance Sampling

Guillaume Alain; Alex Lamb; Chinnadhurai Sankar; Aaron Courville,; Yoshua Bengio

arXiv:1511.06481·stat.ML·April 19, 2016·87 cites

Variance Reduction in SGD by Distributed Importance Sampling

Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville,, Yoshua Bengio

PDF

Open Access 1 Repo

TL;DR

This paper introduces a distributed importance sampling framework for stochastic gradient descent that reduces gradient variance by selecting the most informative training examples, leading to faster and more stable deep learning training.

Contribution

The paper presents a novel distributed importance sampling method for SGD that minimizes gradient variance and is effective even with synchronization delays across machines.

Findings

01

Significant reduction in gradient variance observed.

02

Improved training stability and convergence speed.

03

Effective in distributed settings with synchronization costs.

Abstract

Humans are able to accelerate their learning by selecting training materials that are the most informative and at the appropriate level of difficulty. We propose a framework for distributing deep learning in which one set of workers search for the most informative examples in parallel while a single worker updates the model on examples selected by importance sampling. This leads the model to update using an unbiased estimate of the gradient which also has minimum variance when the sampling proposal is proportional to the L2-norm of the gradient. We show experimentally that this method reduces gradient variance even in a context where the cost of synchronization across machines cannot be ignored, and where the factors for importance sampling are not updated instantly across the training set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

idiap/importance-sampling
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques