A deep representation learning speech enhancement method using   $\beta$-VAE

Yang Xiang; Jesper Lisby H{\o}jvang; Morten H{\o}jfeldt Rasmussen,; Mads Gr{\ae}sb{\o}ll Christensen

arXiv:2205.05581·eess.AS·May 12, 2022·1 cites

A deep representation learning speech enhancement method using $\beta$-VAE

Yang Xiang, Jesper Lisby H{\o}jvang, Morten H{\o}jfeldt Rasmussen,, Mads Gr{\ae}sb{\o}ll Christensen

PDF

Open Access

TL;DR

This paper introduces a novel $eta$-VAE-based speech enhancement method that improves latent representation disentanglement, enhances speech quality, and reduces model complexity compared to previous PVAE approaches.

Contribution

The paper proposes a $eta$-VAE strategy that enhances representation learning in PVAE, overcoming the disentanglement-reconstruction trade-off and optimizing DNN structure.

Findings

01

Better speech and noise latent representations achieved

02

Higher scale-invariant SNR and speech quality

03

Reduced number of training parameters

Abstract

In previous work, we proposed a variational autoencoder-based (VAE) Bayesian permutation training speech enhancement (SE) method (PVAE) which indicated that the SE performance of the traditional deep neural network-based (DNN) method could be improved by deep representation learning (DRL). Based on our previous work, we in this paper propose to use $β$ -VAE to further improve PVAE's ability of representation learning. More specifically, our $β$ -VAE can improve PVAE's capacity of disentangling different latent variables from the observed signal without the trade-off problem between disentanglement and signal reconstruction. This trade-off problem widely exists in previous $β$ -VAE algorithms. Unlike the previous $β$ -VAE algorithms, the proposed $β$ -VAE strategy can also be used to optimize the DNN's structure. This means that the proposed method can not only improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Indoor and Outdoor Localization Technologies