Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual Speech Enhancement
Mostafa Sadeghi, Xavier Alameda-Pineda

TL;DR
This paper introduces Switching Variational Auto-Encoders (SwVAE), which dynamically select between different audio-visual speech enhancement models over time, improving noise-agnostic performance in challenging visual conditions.
Contribution
The paper proposes a novel unsupervised switching VAE framework with a latent Markovian variable to adaptively combine models for noise-agnostic audio-visual speech enhancement.
Findings
SwVAE outperforms traditional models in noisy and cluttered visual conditions.
The model effectively switches between architectures to optimize speech enhancement.
Experimental results demonstrate promising performance of SwVAE.
Abstract
Recently, audio-visual speech enhancement has been tackled in the unsupervised settings based on variational auto-encoders (VAEs), where during training only clean data is used to train a generative model for speech, which at test time is combined with a noise model, e.g. nonnegative matrix factorization (NMF), whose parameters are learned without supervision. Consequently, the proposed model is agnostic to the noise type. When visual data are clean, audio-visual VAE-based architectures usually outperform the audio-only counterpart. The opposite happens when the visual data are corrupted by clutter, e.g. the speaker not facing the camera. In this paper, we propose to find the optimal combination of these two architectures through time. More precisely, we introduce the use of a latent sequential variable with Markovian dependencies to switch between different VAE architectures through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsUSD Coin Customer Service Number +1-833-534-1729
