Block-Sparse Recurrent Neural Networks

Sharan Narang; Eric Undersander; Gregory Diamos

arXiv:1711.02782·cs.LG·November 9, 2017·96 cites

Block-Sparse Recurrent Neural Networks

Sharan Narang, Eric Undersander, Gregory Diamos

PDF

Open Access

TL;DR

This paper explores methods to induce block sparsity in RNNs, achieving high sparsity levels with minimal accuracy loss, thereby reducing model size and improving hardware efficiency for deployment.

Contribution

It introduces two approaches—block pruning and group lasso regularization—to create highly sparse RNNs with practical hardware benefits.

Findings

01

Achieved 80-90% sparsity with minimal accuracy loss.

02

Reduced model size by approximately 10x.

03

Enhanced hardware efficiency over unstructured sparsity.

Abstract

Recurrent Neural Networks (RNNs) are used in state-of-the-art models in domains such as speech recognition, machine translation, and language modelling. Sparsity is a technique to reduce compute and memory requirements of deep learning models. Sparse RNNs are easier to deploy on devices and high-end server processors. Even though sparse operations need less compute and memory relative to their dense counterparts, the speed-up observed by using sparse operations is less than expected on different hardware platforms. In order to address this issue, we investigate two different approaches to induce block sparsity in RNNs: pruning blocks of weights in a layer and using group lasso regularization to create blocks of weights with zeros. Using these techniques, we demonstrate that we can create block-sparse RNNs with sparsity ranging from 80% to 90% with small loss in accuracy. This allows us…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices · Domain Adaptation and Few-Shot Learning

MethodsPruning