Sparse Networks from Scratch: Faster Training without Losing Performance

Tim Dettmers; Luke Zettlemoyer

arXiv:1907.04840·cs.LG·August 27, 2019·191 cites

Sparse Networks from Scratch: Faster Training without Losing Performance

Tim Dettmers, Luke Zettlemoyer

PDF

Open Access 2 Repos

TL;DR

This paper introduces sparse momentum, a novel algorithm for sparse neural network training that accelerates training by up to 5.61 times while maintaining dense performance levels across multiple datasets.

Contribution

The paper presents sparse momentum, a new method for sparse learning that improves training speed and performance consistency in deep neural networks.

Findings

01

Achieves state-of-the-art sparse performance on MNIST, CIFAR-10, and ImageNet.

02

Provides up to 5.61x faster training without performance loss.

03

Demonstrates robustness and ease of use across hyperparameters.

Abstract

We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving dense performance levels. We accomplish this by developing sparse momentum, an algorithm which uses exponentially smoothed gradients (momentum) to identify layers and weights which reduce the error efficiently. Sparse momentum redistributes pruned weights across layers according to the mean momentum magnitude of each layer. Within a layer, sparse momentum grows weights according to the momentum magnitude of zero-valued weights. We demonstrate state-of-the-art sparse performance on MNIST, CIFAR-10, and ImageNet, decreasing the mean error by a relative 8%, 15%, and 6% compared to other sparse algorithms. Furthermore, we show that sparse momentum reliably reproduces dense performance levels while providing up to 5.61x…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning