Biased Importance Sampling for Deep Neural Network Training
Angelos Katharopoulos, Fran\c{c}ois Fleuret

TL;DR
This paper introduces a biased importance sampling method for deep neural network training that uses loss-based importance metrics and a small auxiliary model, leading to faster training and improved generalization.
Contribution
It proposes an efficient importance sampling technique based on loss values, combined with a small auxiliary model, to accelerate deep learning training.
Findings
30% faster training of CNN on CIFAR10
Effective use of loss-based importance sampling
Applicable to both image and language tasks
Abstract
Importance sampling has been successfully used to accelerate stochastic optimization in many convex problems. However, the lack of an efficient way to calculate the importance still hinders its application to Deep Learning. In this paper, we show that the loss value can be used as an alternative importance metric, and propose a way to efficiently approximate it for a deep model, using a small model trained for that purpose in parallel. This method allows in particular to utilize a biased gradient estimate that implicitly optimizes a soft max-loss, and leads to better generalization performance. While such method suffers from a prohibitively high variance of the gradient estimate when using a standard stochastic optimizer, we show that when it is combined with our sampling mechanism, it results in a reliable procedure. We showcase the generality of our method by testing it on both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
