DSD: Dense-Sparse-Dense Training for Deep Neural Networks

Song Han; Jeff Pool; Sharan Narang; Huizi Mao; Enhao Gong; Shijian; Tang; Erich Elsen; Peter Vajda; Manohar Paluri; John Tran; Bryan Catanzaro,; William J. Dally

arXiv:1607.04381·cs.CV·February 23, 2017·143 cites

DSD: Dense-Sparse-Dense Training for Deep Neural Networks

Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian, Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro,, William J. Dally

PDF

Open Access 2 Repos

TL;DR

The paper introduces DSD, a training method involving dense, sparse, and re-dense phases, to improve deep neural network optimization and performance across various architectures and tasks.

Contribution

It proposes a novel dense-sparse-dense training flow that enhances neural network performance without increasing inference complexity.

Findings

01

Improved accuracy on ImageNet for multiple CNN architectures.

02

Enhanced speech recognition WER on WSJ'93 dataset.

03

Better caption generation BLEU scores on Flickr-8K.

Abstract

Modern deep neural networks have a large number of parameters, making them very hard to train. We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks and achieving better optimization performance. In the first D (Dense) step, we train a dense network to learn connection weights and importance. In the S (Sparse) step, we regularize the network by pruning the unimportant connections with small weights and retraining the network given the sparsity constraint. In the final D (re-Dense) step, we increase the model capacity by removing the sparsity constraint, re-initialize the pruned parameters from zero and retrain the whole dense network. Experiments show that DSD training can improve the performance for a wide range of CNNs, RNNs and LSTMs on the tasks of image classification, caption generation and speech recognition. On ImageNet, DSD improved the Top1…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms

Methodssdsd · 1x1 Convolution · Convolution · Average Pooling · Local Response Normalization · Auxiliary Classifier · Inception Module · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections