Empirical Evaluation of A New Approach to Simplifying Long Short-term Memory (LSTM)
Yuzhen Lu

TL;DR
This paper empirically compares the standard LSTM with three simplified variants that reduce parameters, demonstrating comparable performance on sequence modeling tasks with adjustments to learning rates.
Contribution
It introduces and evaluates three simplified LSTM variants by removing certain gate components, showing they perform similarly to the standard model.
Findings
Simplified LSTM variants achieve comparable accuracy to standard LSTM.
Reduced parameter models require tuning of learning rates.
Simplifications can reduce complexity without sacrificing performance.
Abstract
The standard LSTM, although it succeeds in the modeling long-range dependences, suffers from a highly complex structure that can be simplified through modifications to its gate units. This paper was to perform an empirical comparison between the standard LSTM and three new simplified variants that were obtained by eliminating input signal, bias and hidden unit signal from individual gates, on the tasks of modeling two sequence datasets. The experiments show that the three variants, with reduced parameters, can achieve comparable performance with the standard LSTM. Due attention should be paid to turning the learning rate to achieve high accuracies
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
