A Recurrent Variational Autoencoder for Speech Enhancement

Simon Leglaive (IETR); Xavier Alameda-Pineda (PERCEPTION); Laurent; Girin (GIPSA-CRISSP; PERCEPTION); Radu Horaud (PERCEPTION)

arXiv:1910.10942·cs.LG·February 11, 2020

A Recurrent Variational Autoencoder for Speech Enhancement

Simon Leglaive (IETR), Xavier Alameda-Pineda (PERCEPTION), Laurent, Girin (GIPSA-CRISSP, PERCEPTION), Radu Horaud (PERCEPTION)

PDF

Open Access

TL;DR

This paper introduces a recurrent variational autoencoder-based generative model for speech enhancement that leverages temporal dynamics and fine-tuning at test time to improve noise reduction performance.

Contribution

It presents a novel recurrent deep generative speech model combined with a variational EM algorithm, enhancing speech enhancement by modeling temporal dynamics.

Findings

01

Improved speech enhancement results over feed-forward models

02

Effective fine-tuning of the encoder at test time

03

Better modeling of temporal dependencies in speech

Abstract

This paper presents a generative approach to speech enhancement based on a recurrent variational autoencoder (RVAE). The deep generative speech model is trained using clean speech signals only, and it is combined with a nonnegative matrix factorization noise model for speech enhancement. We propose a variational expectation-maximization algorithm where the encoder of the RVAE is fine-tuned at test time, to approximate the distribution of the latent variables given the noisy speech observations. Compared with previous approaches based on feed-forward fully-connected architectures, the proposed recurrent deep generative speech model induces a posterior temporal dynamic over the latent variables, which is shown to improve the speech enhancement results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Face recognition and analysis · Generative Adversarial Networks and Image Synthesis

MethodsTest · Solana Customer Service Number +1-833-534-1729