Recurrent Neural Network Regularization
Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals

TL;DR
This paper introduces a novel method for applying dropout regularization to LSTM-based RNNs, significantly reducing overfitting across multiple sequence modeling tasks.
Contribution
It demonstrates how to properly implement dropout in LSTMs, improving regularization effectiveness where standard dropout fails.
Findings
Dropout applied correctly to LSTMs reduces overfitting.
Improved performance on language modeling, speech recognition, image captioning, and translation.
Significant accuracy gains across various tasks.
Abstract
We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units. Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs. In this paper, we show how to correctly apply dropout to LSTMs, and show that it substantially reduces overfitting on a variety of tasks. These tasks include language modeling, speech recognition, image caption generation, and machine translation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Music and Audio Processing · Image Retrieval and Classification Techniques
MethodsDropout
