Disentanglement Learning for Variational Autoencoders Applied to   Audio-Visual Speech Enhancement

Guillaume Carbajal; Julius Richter; Timo Gerkmann

arXiv:2105.08970·eess.AS·January 4, 2022

Disentanglement Learning for Variational Autoencoders Applied to Audio-Visual Speech Enhancement

Guillaume Carbajal, Julius Richter, Timo Gerkmann

PDF

1 Repo

TL;DR

This paper introduces an adversarial training method for variational autoencoders to achieve better disentanglement of speech attributes, improving audio-visual speech enhancement performance.

Contribution

It proposes a novel adversarial training scheme with an additional encoder to disentangle high-level speech labels from latent variables in VAEs.

Findings

01

Disentanglement improves speech enhancement quality.

02

Using visual data for voice activity detection enhances performance.

03

The method outperforms standard VAEs in speech enhancement tasks.

Abstract

Recently, the standard variational autoencoder has been successfully used to learn a probabilistic prior over speech signals, which is then used to perform speech enhancement. Variational autoencoders have then been conditioned on a label describing a high-level speech attribute (e.g. speech activity) that allows for a more explicit control of speech generation. However, the label is not guaranteed to be disentangled from the other latent variables, which results in limited performance improvements compared to the standard variational autoencoder. In this work, we propose to use an adversarial training scheme for variational autoencoders to disentangle the label from the other latent variables. At training, we use a discriminator that competes with the encoder of the variational autoencoder. Simultaneously, we also use an additional encoder that estimates the label for the decoder of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sp-uhh/disentangled-vae
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.