Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual   Speech Enhancement

Mostafa Sadeghi; Xavier Alameda-Pineda

arXiv:2102.04144·eess.AS·February 9, 2021

Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual Speech Enhancement

Mostafa Sadeghi, Xavier Alameda-Pineda

PDF

TL;DR

This paper introduces Switching Variational Auto-Encoders (SwVAE), which dynamically select between different audio-visual speech enhancement models over time, improving noise-agnostic performance in challenging visual conditions.

Contribution

The paper proposes a novel unsupervised switching VAE framework with a latent Markovian variable to adaptively combine models for noise-agnostic audio-visual speech enhancement.

Findings

01

SwVAE outperforms traditional models in noisy and cluttered visual conditions.

02

The model effectively switches between architectures to optimize speech enhancement.

03

Experimental results demonstrate promising performance of SwVAE.

Abstract

Recently, audio-visual speech enhancement has been tackled in the unsupervised settings based on variational auto-encoders (VAEs), where during training only clean data is used to train a generative model for speech, which at test time is combined with a noise model, e.g. nonnegative matrix factorization (NMF), whose parameters are learned without supervision. Consequently, the proposed model is agnostic to the noise type. When visual data are clean, audio-visual VAE-based architectures usually outperform the audio-only counterpart. The opposite happens when the visual data are corrupted by clutter, e.g. the speaker not facing the camera. In this paper, we propose to find the optimal combination of these two architectures through time. More precisely, we introduce the use of a latent sequential variable with Markovian dependencies to switch between different VAE architectures through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsUSD Coin Customer Service Number +1-833-534-1729