On Fast Dropout and its Applicability to Recurrent Networks
Justin Bayer, Christian Osendorfer, Daniela Korhammer, Nutan Chen,, Sebastian Urban, Patrick van der Smagt

TL;DR
This paper analyzes fast dropout as a regularization technique for RNNs, revealing its adaptive quadratic form and potential to improve RNN performance on sequential data by avoiding biased weight dynamics.
Contribution
It provides a novel perspective on fast dropout, showing it implements an adaptive regularizer that benefits RNN training by avoiding biased weight attractors.
Findings
Fast dropout acts as an adaptive regularizer based on training error.
It enhances RNN performance on musical datasets.
The regularizer's derivative depends solely on training error signals.
Abstract
Recurrent Neural Networks (RNNs) are rich models for the processing of sequential data. Recent work on advancing the state of the art has been focused on the optimization or modelling of RNNs, mostly motivated by adressing the problems of the vanishing and exploding gradients. The control of overfitting has seen considerably less attention. This paper contributes to that by analyzing fast dropout, a recent regularization method for generalized linear models and neural networks from a back-propagation inspired perspective. We show that fast dropout implements a quadratic form of an adaptive, per-parameter regularizer, which rewards large weights in the light of underfitting, penalizes them for overconfident predictions and vanishes at minima of an unregularized training loss. The derivatives of that regularizer are exclusively based on the training error signal. One consequence of this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Gaussian Processes and Bayesian Inference
