Block-Sparse Recurrent Neural Networks
Sharan Narang, Eric Undersander, Gregory Diamos

TL;DR
This paper explores methods to induce block sparsity in RNNs, achieving high sparsity levels with minimal accuracy loss, thereby reducing model size and improving hardware efficiency for deployment.
Contribution
It introduces two approaches—block pruning and group lasso regularization—to create highly sparse RNNs with practical hardware benefits.
Findings
Achieved 80-90% sparsity with minimal accuracy loss.
Reduced model size by approximately 10x.
Enhanced hardware efficiency over unstructured sparsity.
Abstract
Recurrent Neural Networks (RNNs) are used in state-of-the-art models in domains such as speech recognition, machine translation, and language modelling. Sparsity is a technique to reduce compute and memory requirements of deep learning models. Sparse RNNs are easier to deploy on devices and high-end server processors. Even though sparse operations need less compute and memory relative to their dense counterparts, the speed-up observed by using sparse operations is less than expected on different hardware platforms. In order to address this issue, we investigate two different approaches to induce block sparsity in RNNs: pruning blocks of weights in a layer and using group lasso regularization to create blocks of weights with zeros. Using these techniques, we demonstrate that we can create block-sparse RNNs with sparsity ranging from 80% to 90% with small loss in accuracy. This allows us…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices · Domain Adaptation and Few-Shot Learning
MethodsPruning
