Sequential Neural Models with Stochastic Layers

Marco Fraccaro; S{\o}ren Kaae S{\o}nderby; Ulrich Paquet; Ole Winther

arXiv:1605.07571·stat.ML·November 15, 2016·159 cites

Sequential Neural Models with Stochastic Layers

Marco Fraccaro, S{\o}ren Kaae S{\o}nderby, Ulrich Paquet, Ole Winther

PDF

Open Access 1 Repo

TL;DR

This paper presents stochastic recurrent neural networks that combine deterministic RNNs with state space models to better propagate uncertainty, leading to improved speech modeling performance.

Contribution

It introduces a novel stochastic RNN architecture with a structured variational inference approach, enhancing uncertainty modeling in sequential neural networks.

Findings

01

Significantly improves speech modeling results on Blizzard and TIMIT datasets.

02

Achieves comparable performance to state-of-the-art methods on polyphonic music modeling.

03

Effectively propagates uncertainty in latent states through the model.

Abstract

How can we efficiently propagate uncertainty in a latent state representation with recurrent neural networks? This paper introduces stochastic recurrent neural networks which glue a deterministic recurrent neural network and a state space model together to form a stochastic and sequential neural generative model. The clear separation of deterministic and stochastic layers allows a structured variational inference network to track the factorization of the model's posterior distribution. By retaining both the nonlinear recursive structure of a recurrent neural network and averaging over the uncertainty in a latent path, like a state space model, we improve the state of the art results on the Blizzard and TIMIT speech modeling data sets by a large margin, while achieving comparable performances to competing methods on polyphonic music modeling.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

marcofraccaro/srnn
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Model Reduction and Neural Networks