A Recurrent Variational Autoencoder for Speech Enhancement
Simon Leglaive (IETR), Xavier Alameda-Pineda (PERCEPTION), Laurent, Girin (GIPSA-CRISSP, PERCEPTION), Radu Horaud (PERCEPTION)

TL;DR
This paper introduces a recurrent variational autoencoder-based generative model for speech enhancement that leverages temporal dynamics and fine-tuning at test time to improve noise reduction performance.
Contribution
It presents a novel recurrent deep generative speech model combined with a variational EM algorithm, enhancing speech enhancement by modeling temporal dynamics.
Findings
Improved speech enhancement results over feed-forward models
Effective fine-tuning of the encoder at test time
Better modeling of temporal dependencies in speech
Abstract
This paper presents a generative approach to speech enhancement based on a recurrent variational autoencoder (RVAE). The deep generative speech model is trained using clean speech signals only, and it is combined with a nonnegative matrix factorization noise model for speech enhancement. We propose a variational expectation-maximization algorithm where the encoder of the RVAE is fine-tuned at test time, to approximate the distribution of the latent variables given the noisy speech observations. Compared with previous approaches based on feed-forward fully-connected architectures, the proposed recurrent deep generative speech model induces a posterior temporal dynamic over the latent variables, which is shown to improve the speech enhancement results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis · Generative Adversarial Networks and Image Synthesis
MethodsTest · Solana Customer Service Number +1-833-534-1729
