Batch Normalized Recurrent Neural Networks
C\'esar Laurent, Gabriel Pereyra, Phil\'emon Brakel, Ying Zhang and, Yoshua Bengio

TL;DR
This paper investigates the application of batch normalization in recurrent neural networks, finding it challenging but potentially beneficial in specific variants for improving training convergence.
Contribution
The study systematically evaluates batch normalization in RNNs, highlighting its limited effectiveness and proposing insights into its nuanced application.
Findings
Batch normalization applied to RNNs does not improve training speed when used on hidden-to-hidden transitions.
Applying batch normalization to input-to-hidden transitions speeds up convergence but does not enhance generalization.
Certain variants of batch normalization can be beneficial for RNN training despite overall challenges.
Abstract
Recurrent Neural Networks (RNNs) are powerful models for sequential data that have the potential to learn long-term dependencies. However, they are computationally expensive to train and difficult to parallelize. Recent work has shown that normalizing intermediate representations of neural networks can significantly improve convergence rates in feedforward neural networks . In particular, batch normalization, which uses mini-batch statistics to standardize features, was shown to significantly reduce training time. In this paper, we show that applying batch normalization to the hidden-to-hidden transitions of our RNNs doesn't help the training procedure. We also show that when applied to the input-to-hidden transitions, batch normalization can lead to a faster convergence of the training criterion but doesn't seem to improve the generalization performance on both our language modelling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsBatch Normalization
