TL;DR
This large-scale study analyzes eight LSTM variants across three tasks, revealing that the standard LSTM architecture remains competitive and identifying key components and hyperparameters critical for performance.
Contribution
The paper provides the first extensive large-scale analysis of multiple LSTM variants, assessing their importance and hyperparameter effects using the fANOVA framework.
Findings
Standard LSTM performs comparably to variants.
Forget gate and output activation are most critical.
Hyperparameters are largely independent and can be tuned efficiently.
Abstract
Several variants of the Long Short-Term Memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed using the powerful fANOVA framework. In total, we summarize the results of 5400 experimental runs ( years of CPU time), which makes our study the largest of its kind on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
