TL;DR
This paper introduces Restricted Recurrent Neural Networks (RRNNs), a parameter-efficient architecture that reduces the size of RNNs by sharing weights, achieving comparable or better performance with fewer parameters in language modeling.
Contribution
The paper proposes RRNN, a novel RNN compression method that shares parameters across time steps without pre-training, improving efficiency while maintaining or enhancing performance.
Findings
RRNN achieves about 50% parameter reduction.
Restricted LSTM outperforms classical LSTM with fewer parameters.
Performance remains comparable or better despite compression.
Abstract
Recurrent Neural Network (RNN) and its variations such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), have become standard building blocks for learning online data of sequential nature in many research areas, including natural language processing and speech data analysis. In this paper, we present a new methodology to significantly reduce the number of parameters in RNNs while maintaining performance that is comparable or even better than classical RNNs. The new proposal, referred to as Restricted Recurrent Neural Network (RRNN), restricts the weight matrices corresponding to the input data and hidden states at each time step to share a large proportion of parameters. The new architecture can be regarded as a compression of its classical counterpart, but it does not require pre-training or sophisticated parameter fine-tuning, both of which are major issues in most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
