Regularizing RNNs by Stabilizing Activations
David Krueger, Roland Memisevic

TL;DR
This paper introduces a regularization technique for RNNs that stabilizes activations by penalizing changes in hidden state norms, leading to improved performance and better generalization on sequence tasks.
Contribution
The authors propose a novel penalty term that stabilizes RNN activations, enhancing performance and generalization across various RNN architectures and tasks.
Findings
Improved performance on character-level language modeling and phoneme recognition.
IRNNs with the penalty outperform weight noise and dropout.
Stabilized IRNNs generalize better to longer sequences.
Abstract
We stabilize the activations of Recurrent Neural Networks (RNNs) by penalizing the squared distance between successive hidden states' norms. This penalty term is an effective regularizer for RNNs including LSTMs and IRNNs, improving performance on character-level language modeling and phoneme recognition, and outperforming weight noise and dropout. We achieve competitive performance (18.6\% PER) on the TIMIT phoneme recognition task for RNNs evaluated without beam search or an RNN transducer. With this penalty term, IRNN can achieve similar performance to LSTM on language modeling, although adding the penalty term to the LSTM results in superior performance. Our penalty term also prevents the exponential growth of IRNN's activations outside of their training horizon, allowing them to generalize to much longer sequences.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
