Accelerating Deep Learning by Focusing on the Biggest Losers

Angela H. Jiang; Daniel L.-K. Wong; Giulio Zhou; David G. Andersen,; Jeffrey Dean; Gregory R. Ganger; Gauri Joshi; Michael Kaminksy; Michael; Kozuch; Zachary C. Lipton; Padmanabhan Pillai

arXiv:1910.00762·cs.LG·October 3, 2019·36 cites

Accelerating Deep Learning by Focusing on the Biggest Losers

Angela H. Jiang, Daniel L.-K. Wong, Giulio Zhou, David G. Andersen,, Jeffrey Dean, Gregory R. Ganger, Gauri Joshi, Michael Kaminksy, Michael, Kozuch, Zachary C. Lipton, Padmanabhan Pillai

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper presents Selective-Backprop, a method that speeds up deep neural network training by focusing on high-loss examples, reducing computation while maintaining accuracy, and outperforming existing importance sampling techniques.

Contribution

The paper introduces Selective-Backprop, a novel technique that accelerates training by selectively performing backpropagation on high-loss examples, reducing computational cost.

Findings

01

Achieves up to 3.5x faster convergence on CIFAR datasets.

02

Outperforms state-of-the-art importance sampling by 1.02--1.8x.

03

Further accelerates training by 26% using stale forward pass results.

Abstract

This paper introduces Selective-Backprop, a technique that accelerates the training of deep neural networks (DNNs) by prioritizing examples with high loss at each iteration. Selective-Backprop uses the output of a training example's forward pass to decide whether to use that example to compute gradients and update parameters, or to skip immediately to the next example. By reducing the number of computationally-expensive backpropagation steps performed, Selective-Backprop accelerates training. Evaluation on CIFAR10, CIFAR100, and SVHN, across a variety of modern image models, shows that Selective-Backprop converges to target error rates up to 3.5x faster than with standard SGD and between 1.02--1.8x faster than a state-of-the-art importance sampling approach. Further acceleration of 26% can be achieved by using stale forward pass results for selection, thus also skipping forward passes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Accelerating Deep Learning by Focusing on the Biggest Losers· youtube

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsStochastic Gradient Descent