A Two-Stage Deep Representation Learning-Based Speech Enhancement Method   Using Variational Autoencoder and Adversarial Training

Yang Xiang; Jesper Lisby H{\o}jvang; Morten H{\o}jfeldt Rasmussen,; Mads Gr{\ae}sb{\o}ll Christensen

arXiv:2211.09166·eess.AS·September 28, 2023·1 cites

A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training

Yang Xiang, Jesper Lisby H{\o}jvang, Morten H{\o}jfeldt Rasmussen,, Mads Gr{\ae}sb{\o}ll Christensen

PDF

Open Access

TL;DR

This paper introduces a novel two-stage deep learning approach for speech enhancement that combines variational autoencoders and adversarial training to improve speech quality and robustness.

Contribution

It proposes a two-stage DRL-based speech enhancement method using β-VAE and adversarial training to disentangle representations and improve signal estimation accuracy.

Findings

01

Enhanced speech quality with higher STOI, PESQ, and SI-SDR scores.

02

Outperforms recent state-of-the-art speech enhancement algorithms.

03

Demonstrates robustness against inaccurate posterior estimations.

Abstract

This paper focuses on leveraging deep representation learning (DRL) for speech enhancement (SE). In general, the performance of the deep neural network (DNN) is heavily dependent on the learning of data representation. However, the DRL's importance is often ignored in many DNN-based SE algorithms. To obtain a higher quality enhanced speech, we propose a two-stage DRL-based SE method through adversarial training. In the first stage, we disentangle different latent variables because disentangled representations can help DNN generate a better enhanced speech. Specifically, we use the $β$ -variational autoencoder (VAE) algorithm to obtain the speech and noise posterior estimations and related representations from the observed signal. However, since the posteriors and representations are intractable and we can only apply a conditional assumption to estimate them, it is difficult to ensure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Structural Health Monitoring Techniques