Improving the Gating Mechanism of Recurrent Neural Networks
Albert Gu, Caglar Gulcehre, Tom Le Paine, Matt Hoffman, Razvan Pascanu

TL;DR
This paper proposes simple modifications to gating mechanisms in recurrent neural networks that enhance their learnability and performance, especially in tasks requiring long-term dependencies, without adding hyperparameters.
Contribution
The authors introduce two easy-to-implement modifications to standard gating mechanisms that improve gradient flow and learnability in recurrent models.
Findings
Enhanced recurrent models perform better on long-term dependency tasks.
Modifications improve gradient propagation in saturated gating regimes.
Empirical results show robustness across various applications.
Abstract
Gating mechanisms are widely used in neural network models, where they allow gradients to backpropagate more easily through depth or time. However, their saturation property introduces problems of its own. For example, in recurrent models these gates need to have outputs near 1 to propagate information over long time-delays, which requires them to operate in their saturation regime and hinders gradient-based learning of the gate mechanism. We address this problem by deriving two synergistic modifications to the standard gating mechanism that are easy to implement, introduce no additional hyperparameters, and improve learnability of the gates when they are close to saturation. We show how these changes are related to and improve on alternative recently proposed gating mechanisms such as chrono initialization and Ordered Neurons. Empirically, our simple gating mechanisms robustly improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
