Regularizing RNNs by Stabilizing Activations

David Krueger; Roland Memisevic

arXiv:1511.08400·cs.NE·April 27, 2016·ICLR·30 cites

Regularizing RNNs by Stabilizing Activations

David Krueger, Roland Memisevic

PDF

Open Access 1 Repo

TL;DR

This paper introduces a regularization technique for RNNs that stabilizes activations by penalizing changes in hidden state norms, leading to improved performance and better generalization on sequence tasks.

Contribution

The authors propose a novel penalty term that stabilizes RNN activations, enhancing performance and generalization across various RNN architectures and tasks.

Findings

01

Improved performance on character-level language modeling and phoneme recognition.

02

IRNNs with the penalty outperform weight noise and dropout.

03

Stabilized IRNNs generalize better to longer sequences.

Abstract

We stabilize the activations of Recurrent Neural Networks (RNNs) by penalizing the squared distance between successive hidden states' norms. This penalty term is an effective regularizer for RNNs including LSTMs and IRNNs, improving performance on character-level language modeling and phoneme recognition, and outperforming weight noise and dropout. We achieve competitive performance (18.6\% PER) on the TIMIT phoneme recognition task for RNNs evaluated without beam search or an RNN transducer. With this penalty term, IRNN can achieve similar performance to LSTM on language modeling, although adding the penalty term to the LSTM results in superior performance. Our penalty term also prevents the exponential growth of IRNN's activations outside of their training horizon, allowing them to generalize to much longer sequences.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vimarshc/fastai_experiments
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory