Recurrent Neural Network Regularization

Wojciech Zaremba; Ilya Sutskever; Oriol Vinyals

arXiv:1409.2329·cs.NE·February 20, 2015·2.3k cites

Recurrent Neural Network Regularization

Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals

PDF

Open Access 5 Repos

TL;DR

This paper introduces a novel method for applying dropout regularization to LSTM-based RNNs, significantly reducing overfitting across multiple sequence modeling tasks.

Contribution

It demonstrates how to properly implement dropout in LSTMs, improving regularization effectiveness where standard dropout fails.

Findings

01

Dropout applied correctly to LSTMs reduces overfitting.

02

Improved performance on language modeling, speech recognition, image captioning, and translation.

03

Significant accuracy gains across various tasks.

Abstract

We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs. In this paper, we show how to correctly apply dropout to LSTMs, and show that it substantially reduces overfitting on a variety of tasks. These tasks include language modeling, speech recognition, image caption generation, and machine translation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Music and Audio Processing · Image Retrieval and Classification Techniques

MethodsDropout