Mixture of Inference Networks for VAE-based Audio-visual Speech   Enhancement

Mostafa Sadeghi; Xavier Alameda-Pineda

arXiv:1912.10647·eess.AS·March 10, 2021

Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement

Mostafa Sadeghi, Xavier Alameda-Pineda

PDF

TL;DR

This paper introduces MIN-VAE, a novel unsupervised audio-visual speech enhancement model that uses a mixture of inference networks to improve initialization and fusion of audio and visual data, leading to superior speech enhancement performance.

Contribution

The paper proposes MIN-VAE, a mixture of inference networks VAE that effectively fuses audio and visual data for speech enhancement, with an unsupervised learning of modality balance and improved initialization.

Findings

01

MIN-VAE outperforms standard audio-only models.

02

The mixture inference approach improves initialization.

03

The model adaptively fuses audio and visual data.

Abstract

In this paper, we are interested in unsupervised (unknown noise) audio-visual speech enhancement based on variational autoencoders (VAEs), where the probability distribution of clean speech spectra is simulated using an encoder-decoder architecture. The trained generative model (decoder) is then combined with a noise model at test time to estimate the clean speech. In the speech enhancement phase (test time), the initialization of the latent variables, which describe the generative process of clean speech via decoder, is crucial, as the overall inference problem is non-convex. This is usually done by using the output of the trained encoder where the noisy audio and clean visual data are given as input. Current audio-visual VAE models do not provide an effective initialization because the two modalities are tightly coupled (concatenated) in the associated architectures. To overcome this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTest · Solana Customer Service Number +1-833-534-1729 · USD Coin Customer Service Number +1-833-534-1729