Fast and efficient speech enhancement with variational autoencoders

Mostafa Sadeghi (MULTISPEECH); Romain Serizel (MULTISPEECH)

arXiv:2211.02728·cs.SD·November 8, 2022

Fast and efficient speech enhancement with variational autoencoders

Mostafa Sadeghi (MULTISPEECH), Romain Serizel (MULTISPEECH)

PDF

Open Access

TL;DR

This paper introduces a Langevin dynamics-based variational autoencoder method for speech enhancement that balances computational efficiency with high-quality results, outperforming existing approaches.

Contribution

It proposes a novel Langevin dynamics approach with total variation regularization for variational autoencoders in speech enhancement, improving efficiency and performance.

Findings

01

Outperforms existing speech enhancement methods

02

Balances computational efficiency with enhancement quality

03

Uses Langevin dynamics with temporal regularization

Abstract

Unsupervised speech enhancement based on variational autoencoders has shown promising performance compared with the commonly used supervised methods. This approach involves the use of a pre-trained deep speech prior along with a parametric noise model, where the noise parameters are learned from the noisy speech signal with an expectationmaximization (EM)-based method. The E-step involves an intractable latent posterior distribution. Existing algorithms to solve this step are either based on computationally heavy Monte Carlo Markov Chain sampling methods and variational inference, or inefficient optimization-based methods. In this paper, we propose a new approach based on Langevin dynamics that generates multiple sequences of samples and comes with a total variation-based regularization to incorporate temporal correlations of latent vectors. Our experiments demonstrate that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Gait Recognition and Analysis