Recurrent Batch Normalization
Tim Cooijmans, Nicolas Ballas, C\'esar Laurent, \c{C}a\u{g}lar, G\"ul\c{c}ehre, Aaron Courville

TL;DR
This paper introduces a novel reparameterization of LSTM that applies batch normalization to hidden-to-hidden transitions, leading to faster training and better generalization in sequence tasks.
Contribution
It extends batch normalization to the hidden-to-hidden transitions in LSTMs, which was not previously explored, improving training efficiency and performance.
Findings
Faster convergence in training.
Improved generalization across tasks.
Effective batch normalization for recurrent transitions.
Abstract
We propose a reparameterization of LSTM that brings the benefits of batch normalization to recurrent neural networks. Whereas previous works only apply batch normalization to the input-to-hidden transformation of RNNs, we demonstrate that it is both possible and beneficial to batch-normalize the hidden-to-hidden transition, thereby reducing internal covariate shift between time steps. We evaluate our proposal on various sequential problems such as sequence classification, language modeling and question answering. Our empirical results show that our batch-normalized LSTM consistently leads to faster convergence and improved generalization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
