Z-Forcing: Training Stochastic Recurrent Networks
Anirudh Goyal, Alessandro Sordoni, Marc-Alexandre C\^ot\'e, Nan, Rosemary Ke, Yoshua Bengio

TL;DR
This paper introduces Z-Forcing, a stochastic recurrent model with auxiliary costs that improves training and achieves state-of-the-art results on speech benchmarks and competitive performance on sequential MNIST, also aiding interpretability in language modeling.
Contribution
The paper proposes Z-Forcing, a novel stochastic recurrent network with auxiliary costs, enhancing training stability and performance for sequential data modeling.
Findings
Achieves state-of-the-art results on TIMIT and Blizzard speech benchmarks.
Performs competitively on sequential MNIST.
Helps in learning interpretable latent variables in language modeling.
Abstract
Many efforts have been devoted to training generative latent variable models with autoregressive decoders, such as recurrent neural networks (RNN). Stochastic recurrent models have been successful in capturing the variability observed in natural sequential data such as speech. We unify successful ideas from recently proposed architectures into a stochastic recurrent model: each step in the sequence is associated with a latent variable that is used to condition the recurrent dynamics for future steps. Training is performed with amortized variational inference where the approximate posterior is augmented with a RNN that runs backward through the sequence. In addition to maximizing the variational lower bound, we ease training of the latent variables by adding an auxiliary cost which forces them to reconstruct the state of the backward recurrent network. This provides the latent variables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Speech Recognition and Synthesis
