A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoder
Yang Xiang, Jesper Lisby H{\o}jvang, Morten H{\o}jfeldt Rasmussen,, Mads Gr{\ae}sb{\o}ll Christensen

TL;DR
This paper introduces a Bayesian permutation training deep learning method using variational autoencoders for speech enhancement, enabling disentanglement of speech and noise signals and outperforming traditional DNN approaches.
Contribution
It proposes a novel Bayesian VAE framework that models both speech and noise signals, allowing supervised training and effective disentanglement of latent variables.
Findings
Disentangles speech and noise latent variables from observed signals.
Achieves higher scale-invariant SNR and speech quality scores.
Outperforms similar DNN-based speech enhancement methods.
Abstract
Recently, variational autoencoder (VAE), a deep representation learning (DRL) model, has been used to perform speech enhancement (SE). However, to the best of our knowledge, current VAE-based SE methods only apply VAE to the model speech signal, while noise is modeled using the traditional non-negative matrix factorization (NMF) model. One of the most important reasons for using NMF is that these VAE-based methods cannot disentangle the speech and noise latent variables from the observed signal. Based on Bayesian theory, this paper derives a novel variational lower bound for VAE, which ensures that VAE can be trained in supervision, and can disentangle speech and noise latent variables from the observed signal. This means that the proposed method can apply the VAE to model both speech and noise signals, which is totally different from the previous VAE-based SE works. More specifically,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation
