Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
David Krueger, Tegan Maharaj, J\'anos Kram\'ar, Mohammad Pezeshki,, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Aaron, Courville, Chris Pal

TL;DR
Zoneout is a new regularization technique for RNNs that preserves hidden states randomly, enhancing gradient flow and improving performance across various language modeling and sequential tasks.
Contribution
Introduces zoneout, a novel regularization method for RNNs that maintains hidden units' states randomly, leading to better generalization and state propagation.
Findings
Zoneout improves RNN performance on language modeling tasks.
Combining zoneout with batch normalization achieves state-of-the-art results on permuted sequential MNIST.
Zoneout outperforms other regularizers in empirical evaluations.
Abstract
We propose zoneout, a novel method for regularizing RNNs. At each timestep, zoneout stochastically forces some hidden units to maintain their previous values. Like dropout, zoneout uses random noise to train a pseudo-ensemble, improving generalization. But by preserving instead of dropping hidden units, gradient information and state information are more readily propagated through time, as in feedforward stochastic depth networks. We perform an empirical investigation of various RNN regularizers, and find that zoneout gives significant performance improvements across tasks. We achieve competitive results with relatively simple models in character- and word-level language modelling on the Penn Treebank and Text8 datasets, and combining with recurrent batch normalization yields state-of-the-art results on permuted sequential MNIST.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsZoneout · Stochastic Depth
