TL;DR
This paper introduces a novel unsupervised speech enhancement method using dynamical variational autoencoders (DVAEs) that models temporal dependencies and outperforms existing approaches, especially on unseen noise types.
Contribution
It extends VAE-based speech enhancement to DVAEs, combining speech dynamics modeling with unsupervised learning for improved noise robustness.
Findings
DVAE-based method outperforms VAE-based and baseline methods
Effective on unseen noise types
Versatile framework with three DVAE models
Abstract
Dynamical variational autoencoders (DVAEs) are a class of deep generative models with latent variables, dedicated to model time series of high-dimensional data. DVAEs can be considered as extensions of the variational autoencoder (VAE) that include temporal dependencies between successive observed and/or latent vectors. Previous work has shown the interest of using DVAEs over the VAE for speech spectrograms modeling. Independently, the VAE has been successfully applied to speech enhancement in noise, in an unsupervised noise-agnostic set-up that requires neither noise samples nor noisy speech samples at training time, but only requires clean speech signals. In this paper, we extend these works to DVAE-based single-channel unsupervised speech enhancement, hence exploiting both speech signals unsupervised representation learning and dynamics modeling. We propose an unsupervised speech…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
