Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference
Rishav Mukherji, Mark Sch\"one, Khaleelulla Khan Nazeer, Christian, Mayr, Anand Subramoney

TL;DR
This paper demonstrates that combining activity sparsity with weight sparsity in RNNs significantly reduces computation while maintaining high performance, advancing efficient deep learning and neuromorphic computing applications.
Contribution
It shows that activity sparsity can multiplicatively complement weight sparsity in RNNs, achieving unprecedented computational reduction without performance loss.
Findings
Up to 20x reduction in computation on Penn Treebank
Maintained perplexity below 60 with activity sparsity
First to combine activity and weight sparsity effectively in RNNs
Abstract
Artificial neural networks open up unprecedented machine learning capabilities at the cost of ever growing computational requirements. Sparsifying the parameters, often achieved through weight pruning, has been identified as a powerful technique to compress the number of model parameters and reduce the computational operations of neural networks. Yet, sparse activations, while omnipresent in both biological neural networks and deep learning systems, have not been fully utilized as a compression technique in deep learning. Moreover, the interaction between sparse activations and weight pruning is not fully understood. In this work, we demonstrate that activity sparsity can compose multiplicatively with parameter sparsity in a recurrent neural network model based on the GRU that is designed to be activity sparse. We achieve up to reduction of computation while maintaining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Neural Networks and Reservoir Computing
MethodsPruning · Gated Recurrent Unit
