Run-Time Efficient RNN Compression for Inference on Edge Devices

Urmish Thakker; Jesse Beu; Dibakar Gope; Ganesh Dasika; Matthew; Mattina

arXiv:1906.04886·cs.LG·August 14, 2020

Run-Time Efficient RNN Compression for Inference on Edge Devices

Urmish Thakker, Jesse Beu, Dibakar Gope, Ganesh Dasika, Matthew, Mattina

PDF

TL;DR

This paper introduces Hybrid Matrix Decomposition, a novel RNN compression method that significantly reduces model size while maintaining or improving inference speed and accuracy on edge devices.

Contribution

The paper proposes a new RNN compression technique called Hybrid Matrix Decomposition that balances compression, speed, and accuracy for edge device inference.

Findings

01

Achieves 2-4x compression of RNNs

02

Faster run-time than pruning methods

03

Retains more accuracy than matrix factorization

Abstract

Recurrent neural networks can be large and compute-intensive, yet many applications that benefit from RNNs run on small devices with very limited compute and storage capabilities while still having run-time constraints. As a result, there is a need for compression techniques that can achieve significant compression without negatively impacting inference run-time and task accuracy. This paper explores a new compressed RNN cell implementation called Hybrid Matrix Decomposition (HMD) that achieves this dual objective. This scheme divides the weight matrix into two parts - an unconstrained upper half and a lower half composed of rank-1 blocks. This results in output features where the upper sub-vector has "richer" features while the lower-sub vector has "constrained features". HMD can compress RNNs by a factor of 2-4x while having a faster run-time than pruning (Zhu &Gupta, 2017) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning