Biased Importance Sampling for Deep Neural Network Training

Angelos Katharopoulos; Fran\c{c}ois Fleuret

arXiv:1706.00043·cs.LG·September 14, 2017·48 cites

Biased Importance Sampling for Deep Neural Network Training

Angelos Katharopoulos, Fran\c{c}ois Fleuret

PDF

Open Access 1 Repo

TL;DR

This paper introduces a biased importance sampling method for deep neural network training that uses loss-based importance metrics and a small auxiliary model, leading to faster training and improved generalization.

Contribution

It proposes an efficient importance sampling technique based on loss values, combined with a small auxiliary model, to accelerate deep learning training.

Findings

01

30% faster training of CNN on CIFAR10

02

Effective use of loss-based importance sampling

03

Applicable to both image and language tasks

Abstract

Importance sampling has been successfully used to accelerate stochastic optimization in many convex problems. However, the lack of an efficient way to calculate the importance still hinders its application to Deep Learning. In this paper, we show that the loss value can be used as an alternative importance metric, and propose a way to efficiently approximate it for a deep model, using a small model trained for that purpose in parallel. This method allows in particular to utilize a biased gradient estimate that implicitly optimizes a soft max-loss, and leads to better generalization performance. While such method suffers from a prohibitively high variance of the gradient estimate when using a standard stochastic optimizer, we show that when it is combined with our sampling mechanism, it results in a reliable procedure. We showcase the generality of our method by testing it on both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

idiap/importance-sampling
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning