Dynamically Hierarchy Revolution: DirNet for Compressing Recurrent Neural Network on Mobile Devices
Jie Zhang, Xiaolong Wang, Dawei Li, Yalin Wang

TL;DR
This paper introduces DirNet, a novel dynamic compression method for RNNs that significantly reduces model size and enables real-time inference on mobile devices with minimal accuracy loss.
Contribution
DirNet employs an optimized dictionary learning algorithm to dynamically adjust compression and sparsity across layers, outperforming prior methods in RNN model compression.
Findings
Achieves up to 8x model size reduction on mobile devices
Maintains real-time inference with negligible accuracy loss
Outperforms previous compression approaches in experiments
Abstract
Recurrent neural networks (RNNs) achieve cutting-edge performance on a variety of problems. However, due to their high computational and memory demands, deploying RNNs on resource constrained mobile devices is a challenging task. To guarantee minimum accuracy loss with higher compression rate and driven by the mobile resource requirement, we introduce a novel model compression approach DirNet based on an optimized fast dictionary learning algorithm, which 1) dynamically mines the dictionary atoms of the projection dictionary matrix within layer to adjust the compression rate 2) adaptively changes the sparsity of sparse codes cross the hierarchical layers. Experimental results on language model and an ASR model trained with a 1000h speech dataset demonstrate that our method significantly outperforms prior approaches. Evaluated on off-the-shelf mobile devices, we are able to reduce the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Topic Modeling
