FRMDN: Flow-based Recurrent Mixture Density Network
Seyedeh Fatemeh Razavi, Reshad Hosseini, Tina Behzad

TL;DR
This paper introduces FRMDN, a flow-based recurrent mixture density network that enhances sequence modeling by applying normalizing flows to Gaussian mixtures, leading to improved fit and performance on image and speech data.
Contribution
The paper generalizes recurrent mixture density networks by integrating normalizing flows, significantly improving modeling accuracy for sequence data.
Findings
Improved log-likelihood on image sequences
Outperforms state-of-the-art methods on speech data
Enhanced modeling power with flow-based transformations
Abstract
The class of recurrent mixture density networks is an important class of probabilistic models used extensively in sequence modeling and sequence-to-sequence mapping applications. In this class of models, the density of a target sequence in each time-step is modeled by a Gaussian mixture model with the parameters given by a recurrent neural network. In this paper, we generalize recurrent mixture density networks by defining a Gaussian mixture model on a non-linearly transformed target sequence in each time-step. The non-linearly transformed space is created by normalizing flow. We observed that this model significantly improves the fit to image sequences measured by the log-likelihood. We also applied the proposed model on some speech and image data, and observed that the model has significant modeling power outperforming other state-of-the-art methods in terms of the log-likelihood.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Speech Recognition and Synthesis · Bayesian Methods and Mixture Models
