Intrinsically Sparse Long Short-Term Memory Networks
Shiwei Liu, Decebal Constantin Mocanu, Mykola Pechenizkiy

TL;DR
This paper introduces SET-LSTM, a sparse, evolutionary training-based LSTM model that reduces parameters significantly while maintaining or improving performance on sentiment analysis tasks.
Contribution
The paper proposes a novel sparse LSTM architecture using SET and evolutionary strategies, enabling efficient training and inference with fewer parameters.
Findings
SET-LSTM achieves better or comparable accuracy than dense LSTMs.
SET-LSTM uses less than 4% of the parameters of traditional models.
The approach is effective across multiple sentiment analysis datasets.
Abstract
Long Short-Term Memory (LSTM) has achieved state-of-the-art performances on a wide range of tasks. Its outstanding performance is guaranteed by the long-term memory ability which matches the sequential data perfectly and the gating structure controlling the information flow. However, LSTMs are prone to be memory-bandwidth limited in realistic applications and need an unbearable period of training and inference time as the model size is ever-increasing. To tackle this problem, various efficient model compression methods have been proposed. Most of them need a big and expensive pre-trained model which is a nightmare for resource-limited devices where the memory budget is strictly limited. To remedy this situation, in this paper, we incorporate the Sparse Evolutionary Training (SET) procedure into LSTM, proposing a novel model dubbed SET-LSTM. Rather than starting with a fully-connected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
