Learning Longer Memory in Recurrent Neural Networks

Tomas Mikolov; Armand Joulin; Sumit Chopra; Michael Mathieu,; Marc'Aurelio Ranzato

arXiv:1412.7753·cs.NE·April 20, 2015·ICLR·198 cites

Learning Longer Memory in Recurrent Neural Networks

Tomas Mikolov, Armand Joulin, Sumit Chopra, Michael Mathieu,, Marc'Aurelio Ranzato

PDF

Open Access 5 Repos

TL;DR

This paper demonstrates that simple recurrent neural networks can learn long-term dependencies in sequential data by a structural modification that encourages slow-changing hidden units, achieving performance comparable to complex LSTM models.

Contribution

The paper introduces a simple architectural modification to RNNs that enables learning longer-term patterns effectively, challenging the belief that complex units are necessary.

Findings

01

Achieves similar performance to LSTMs in language modeling

02

Enables RNNs to learn longer-term dependencies

03

Uses a near-identity recurrent weight matrix to extend memory

Abstract

Recurrent neural network is a powerful model that learns temporal patterns in sequential data. For a long time, it was believed that recurrent networks are difficult to train using simple optimizers, such as stochastic gradient descent, due to the so-called vanishing gradient problem. In this paper, we show that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent. This is achieved by using a slight structural modification of the simple recurrent neural network architecture. We encourage some of the hidden units to change their state slowly by making part of the recurrent weight matrix close to identity, thus forming kind of a longer term memory. We evaluate our model in language modeling experiments, where we obtain similar performance to the much more complex Long Short Term Memory (LSTM) networks (Hochreiter &…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Neural Networks and Applications · Topic Modeling