On Fast Dropout and its Applicability to Recurrent Networks

Justin Bayer; Christian Osendorfer; Daniela Korhammer; Nutan Chen,; Sebastian Urban; Patrick van der Smagt

arXiv:1311.0701·stat.ML·March 6, 2014·ICLR·45 cites

On Fast Dropout and its Applicability to Recurrent Networks

Justin Bayer, Christian Osendorfer, Daniela Korhammer, Nutan Chen,, Sebastian Urban, Patrick van der Smagt

PDF

Open Access 1 Repo

TL;DR

This paper analyzes fast dropout as a regularization technique for RNNs, revealing its adaptive quadratic form and potential to improve RNN performance on sequential data by avoiding biased weight dynamics.

Contribution

It provides a novel perspective on fast dropout, showing it implements an adaptive regularizer that benefits RNN training by avoiding biased weight attractors.

Findings

01

Fast dropout acts as an adaptive regularizer based on training error.

02

It enhances RNN performance on musical datasets.

03

The regularizer's derivative depends solely on training error signals.

Abstract

Recurrent Neural Networks (RNNs) are rich models for the processing of sequential data. Recent work on advancing the state of the art has been focused on the optimization or modelling of RNNs, mostly motivated by adressing the problems of the vanishing and exploding gradients. The control of overfitting has seen considerably less attention. This paper contributes to that by analyzing fast dropout, a recent regularization method for generalized linear models and neural networks from a back-propagation inspired perspective. We show that fast dropout implements a quadratic form of an adaptive, per-parameter regularizer, which rewards large weights in the light of underfitting, penalizes them for overconfident predictions and vanishes at minima of an unregularized training loss. The derivatives of that regularizer are exclusively based on the training error signal. One consequence of this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

George091/RNN
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Gaussian Processes and Bayesian Inference