DSD: Dense-Sparse-Dense Training for Deep Neural Networks
Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian, Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro,, William J. Dally

TL;DR
The paper introduces DSD, a training method involving dense, sparse, and re-dense phases, to improve deep neural network optimization and performance across various architectures and tasks.
Contribution
It proposes a novel dense-sparse-dense training flow that enhances neural network performance without increasing inference complexity.
Findings
Improved accuracy on ImageNet for multiple CNN architectures.
Enhanced speech recognition WER on WSJ'93 dataset.
Better caption generation BLEU scores on Flickr-8K.
Abstract
Modern deep neural networks have a large number of parameters, making them very hard to train. We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks and achieving better optimization performance. In the first D (Dense) step, we train a dense network to learn connection weights and importance. In the S (Sparse) step, we regularize the network by pruning the unimportant connections with small weights and retraining the network given the sparsity constraint. In the final D (re-Dense) step, we increase the model capacity by removing the sparsity constraint, re-initialize the pruned parameters from zero and retrain the whole dense network. Experiments show that DSD training can improve the performance for a wide range of CNNs, RNNs and LSTMs on the tasks of image classification, caption generation and speech recognition. On ImageNet, DSD improved the Top1…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms
Methodssdsd · 1x1 Convolution · Convolution · Average Pooling · Local Response Normalization · Auxiliary Classifier · Inception Module · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections
