Bayesian Recurrent Neural Networks
Meire Fortunato, Charles Blundell, Oriol Vinyals

TL;DR
This paper presents a simple variational Bayes approach for RNNs that improves uncertainty estimation, regularization, and performance, with novel posterior approximation techniques and broad applicability.
Contribution
It introduces a straightforward adaptation of truncated backpropagation through time for Bayesian RNNs and a new posterior approximation method that enhances model performance.
Findings
Bayesian RNNs outperform traditional RNNs on language modeling and image captioning.
The proposed methods reduce parameters by 80% while maintaining accuracy.
A new benchmark for uncertainty in language models is introduced.
Abstract
In this work we explore a straightforward variational Bayes scheme for Recurrent Neural Networks. Firstly, we show that a simple adaptation of truncated backpropagation through time can yield good quality uncertainty estimates and superior regularisation at only a small extra computational cost during training, also reducing the amount of parameters by 80\%. Secondly, we demonstrate how a novel kind of posterior approximation yields further improvements to the performance of Bayesian RNNs. We incorporate local gradient information into the approximate posterior to sharpen it around the current batch statistics. We show how this technique is not exclusive to recurrent neural networks and can be applied more widely to train Bayesian neural networks. We also empirically demonstrate how Bayesian RNNs are superior to traditional RNNs on a language modelling benchmark and an image captioning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms
