Learning Intrinsic Sparse Structures within Long Short-Term Memory
Wei Wen, Yuxiong He, Samyam Rajbhandari, Minjia Zhang, Wenhan Wang,, Fang Liu, Bin Hu, Yiran Chen, Hai Li

TL;DR
This paper introduces Intrinsic Sparse Structures (ISS) within LSTM units to enable effective model compression, maintaining regularity and performance while significantly reducing model size and increasing speed.
Contribution
The paper proposes a novel ISS method that ensures consistent dimension reduction in LSTM components, enabling efficient compression without invalidating units.
Findings
Achieved 10.59x speedup on language modeling without accuracy loss.
Successfully compressed models to 2.69M weights for SQuAD QA task.
Extended the approach to non-LSTM RNNs like RHNs.
Abstract
Model compression is significant for the wide adoption of Recurrent Neural Networks (RNNs) in both user devices possessing limited resources and business clusters requiring quick responses to large-scale service requests. This work aims to learn structurally-sparse Long Short-Term Memory (LSTM) by reducing the sizes of basic structures within LSTM units, including input updates, gates, hidden states, cell states and outputs. Independently reducing the sizes of basic structures can result in inconsistent dimensions among them, and consequently, end up with invalid LSTM units. To overcome the problem, we propose Intrinsic Sparse Structures (ISS) in LSTMs. Removing a component of ISS will simultaneously decrease the sizes of all basic structures by one and thereby always maintain the dimension consistency. By learning ISS within LSTM units, the obtained LSTMs remain regular while having…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
