Sequential Neural Models with Stochastic Layers
Marco Fraccaro, S{\o}ren Kaae S{\o}nderby, Ulrich Paquet, Ole Winther

TL;DR
This paper presents stochastic recurrent neural networks that combine deterministic RNNs with state space models to better propagate uncertainty, leading to improved speech modeling performance.
Contribution
It introduces a novel stochastic RNN architecture with a structured variational inference approach, enhancing uncertainty modeling in sequential neural networks.
Findings
Significantly improves speech modeling results on Blizzard and TIMIT datasets.
Achieves comparable performance to state-of-the-art methods on polyphonic music modeling.
Effectively propagates uncertainty in latent states through the model.
Abstract
How can we efficiently propagate uncertainty in a latent state representation with recurrent neural networks? This paper introduces stochastic recurrent neural networks which glue a deterministic recurrent neural network and a state space model together to form a stochastic and sequential neural generative model. The clear separation of deterministic and stochastic layers allows a structured variational inference network to track the factorization of the model's posterior distribution. By retaining both the nonlinear recursive structure of a recurrent neural network and averaging over the uncertainty in a latent path, like a state space model, we improve the state of the art results on the Blizzard and TIMIT speech modeling data sets by a large margin, while achieving comparable performances to competing methods on polyphonic music modeling.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Model Reduction and Neural Networks
