Posterior sampling algorithms for unsupervised speech enhancement with   recurrent variational autoencoder

Mostafa Sadeghi (MULTISPEECH); Romain Serizel (MULTISPEECH)

arXiv:2309.10439·cs.CV·September 20, 2023

Posterior sampling algorithms for unsupervised speech enhancement with recurrent variational autoencoder

Mostafa Sadeghi (MULTISPEECH), Romain Serizel (MULTISPEECH)

PDF

Open Access

TL;DR

This paper introduces efficient sampling algorithms for unsupervised speech enhancement using RVAE, improving computational efficiency and robustness over traditional variational inference methods and supervised approaches.

Contribution

It proposes Langevin dynamics and Metropolis-Hasting sampling techniques to replace variational inference in RVAE-based speech enhancement, reducing complexity and enhancing performance.

Findings

01

Sampling methods outperform VEM in efficiency and accuracy

02

Proposed algorithms generalize well to mismatched conditions

03

Outperform supervised diffusion model approaches

Abstract

In this paper, we address the unsupervised speech enhancement problem based on recurrent variational autoencoder (RVAE). This approach offers promising generalization performance over the supervised counterpart. Nevertheless, the involved iterative variational expectation-maximization (VEM) process at test time, which relies on a variational inference method, results in high computational complexity. To tackle this issue, we present efficient sampling techniques based on Langevin dynamics and Metropolis-Hasting algorithms, adapted to the EM-based speech enhancement with RVAE. By directly sampling from the intractable posterior distribution within the EM process, we circumvent the intricacies of variational inference. We conduct a series of experiments, comparing the proposed methods with VEM and a state-of-the-art supervised speech enhancement approach based on diffusion models. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques

MethodsVariational Inference · Diffusion