A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training
Yang Xiang, Jesper Lisby H{\o}jvang, Morten H{\o}jfeldt Rasmussen,, Mads Gr{\ae}sb{\o}ll Christensen

TL;DR
This paper introduces a novel two-stage deep learning approach for speech enhancement that combines variational autoencoders and adversarial training to improve speech quality and robustness.
Contribution
It proposes a two-stage DRL-based speech enhancement method using β-VAE and adversarial training to disentangle representations and improve signal estimation accuracy.
Findings
Enhanced speech quality with higher STOI, PESQ, and SI-SDR scores.
Outperforms recent state-of-the-art speech enhancement algorithms.
Demonstrates robustness against inaccurate posterior estimations.
Abstract
This paper focuses on leveraging deep representation learning (DRL) for speech enhancement (SE). In general, the performance of the deep neural network (DNN) is heavily dependent on the learning of data representation. However, the DRL's importance is often ignored in many DNN-based SE algorithms. To obtain a higher quality enhanced speech, we propose a two-stage DRL-based SE method through adversarial training. In the first stage, we disentangle different latent variables because disentangled representations can help DNN generate a better enhanced speech. Specifically, we use the -variational autoencoder (VAE) algorithm to obtain the speech and noise posterior estimations and related representations from the observed signal. However, since the posteriors and representations are intractable and we can only apply a conditional assumption to estimate them, it is difficult to ensure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Structural Health Monitoring Techniques
