Pushing the limits of RNN Compression
Urmish Thakker, Igor Fedorov, Jesse Beu, Dibakar Gope, Chu Zhou,, Ganesh Dasika, Matthew Mattina

TL;DR
This paper presents a novel RNN compression method using Kronecker products that achieves 16-38x size reduction with minimal accuracy loss, outperforming existing techniques across multiple benchmarks.
Contribution
Introduces Kronecker product-based compression for RNNs, significantly reducing size while maintaining or improving accuracy compared to prior methods.
Findings
Achieves 16-38x compression with minimal accuracy loss.
Outperforms pruning and low-rank matrix factorization on 4 benchmarks.
Improves inference run-time in resource-constrained environments.
Abstract
Recurrent Neural Networks (RNN) can be difficult to deploy on resource constrained devices due to their size. As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy. This paper introduces a method to compress RNNs for resource constrained environments using Kronecker product (KP). KPs can compress RNN layers by 16-38x with minimal accuracy loss. We show that KP can beat the task accuracy achieved by other state-of-the-art compression techniques (pruning and low-rank matrix factorization) across 4 benchmarks spanning 3 different applications, while simultaneously improving inference run-time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsKollen-Pollack Learning
