LSTM: A Search Space Odyssey

Klaus Greff; Rupesh Kumar Srivastava; Jan Koutn\'ik; Bas R.; Steunebrink; J\"urgen Schmidhuber

arXiv:1503.04069·cs.NE·October 5, 2017

LSTM: A Search Space Odyssey

Klaus Greff, Rupesh Kumar Srivastava, Jan Koutn\'ik, Bas R., Steunebrink, J\"urgen Schmidhuber

PDF

5 Repos

TL;DR

This large-scale study analyzes eight LSTM variants across three tasks, revealing that the standard LSTM architecture remains competitive and identifying key components and hyperparameters critical for performance.

Contribution

The paper provides the first extensive large-scale analysis of multiple LSTM variants, assessing their importance and hyperparameter effects using the fANOVA framework.

Findings

01

Standard LSTM performs comparably to variants.

02

Forget gate and output activation are most critical.

03

Hyperparameters are largely independent and can be tuned efficiently.

Abstract

Several variants of the Long Short-Term Memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed using the powerful fANOVA framework. In total, we summarize the results of 5400 experimental runs ( $\approx 15$ years of CPU time), which makes our study the largest of its kind on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory