Recurrent Batch Normalization

Tim Cooijmans; Nicolas Ballas; C\'esar Laurent; \c{C}a\u{g}lar; G\"ul\c{c}ehre; Aaron Courville

arXiv:1603.09025·cs.LG·March 1, 2017·58 cites

Recurrent Batch Normalization

Tim Cooijmans, Nicolas Ballas, C\'esar Laurent, \c{C}a\u{g}lar, G\"ul\c{c}ehre, Aaron Courville

PDF

Open Access 3 Repos

TL;DR

This paper introduces a novel reparameterization of LSTM that applies batch normalization to hidden-to-hidden transitions, leading to faster training and better generalization in sequence tasks.

Contribution

It extends batch normalization to the hidden-to-hidden transitions in LSTMs, which was not previously explored, improving training efficiency and performance.

Findings

01

Faster convergence in training.

02

Improved generalization across tasks.

03

Effective batch normalization for recurrent transitions.

Abstract

We propose a reparameterization of LSTM that brings the benefits of batch normalization to recurrent neural networks. Whereas previous works only apply batch normalization to the input-to-hidden transformation of RNNs, we demonstrate that it is both possible and beneficial to batch-normalize the hidden-to-hidden transition, thereby reducing internal covariate shift between time steps. We evaluate our proposal on various sequential problems such as sequence classification, language modeling and question answering. Our empirical results show that our batch-normalized LSTM consistently leads to faster convergence and improved generalization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory