A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
Yarin Gal, Zoubin Ghahramani

TL;DR
This paper provides a Bayesian interpretation of dropout, extending it to recurrent neural networks, and demonstrates improved performance in language modeling and sentiment analysis tasks.
Contribution
It introduces a variational inference-based dropout method for RNNs, improving over existing techniques and achieving state-of-the-art results.
Findings
Outperforms existing dropout methods in RNNs
Achieves state-of-the-art perplexity on Penn Treebank
Enhances the theoretical understanding of dropout in RNNs
Abstract
Recurrent neural networks (RNNs) stand at the forefront of many recent developments in deep learning. Yet a major difficulty with these models is their tendency to overfit, with dropout shown to fail when applied to recurrent layers. Recent results at the intersection of Bayesian modelling and deep learning offer a Bayesian interpretation of common deep learning techniques such as dropout. This grounding of dropout in approximate Bayesian inference suggests an extension of the theoretical results, offering insights into the use of dropout with RNN models. We apply this new variational inference based dropout technique in LSTM and GRU models, assessing it on language modelling and sentiment analysis tasks. The new approach outperforms existing techniques, and to the best of our knowledge improves on the single model state-of-the-art in language modelling with the Penn Treebank (73.4 test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms
MethodsSigmoid Activation · Tanh Activation · Embedding Dropout · Variational Dropout · Dropout · Gated Recurrent Unit · Long Short-Term Memory
