A weighted-variance variational autoencoder model for speech enhancement

Ali Golmakani (MULTISPEECH); Mostafa Sadeghi (MULTISPEECH); Xavier; Alameda-Pineda (ROBOTLEARN); Romain Serizel (MULTISPEECH)

arXiv:2211.00990·cs.SD·October 27, 2023

A weighted-variance variational autoencoder model for speech enhancement

Ali Golmakani (MULTISPEECH), Mostafa Sadeghi (MULTISPEECH), Xavier, Alameda-Pineda (ROBOTLEARN), Romain Serizel (MULTISPEECH)

PDF

Open Access

TL;DR

This paper introduces a weighted-variance variational autoencoder for speech enhancement, which improves robustness by weighting spectrogram frames and modeling speech with a Student's t-distribution.

Contribution

It proposes a novel weighted variance generative model with Gamma priors, leading to more effective speech enhancement algorithms than traditional unweighted models.

Findings

01

Enhanced speech quality in experiments

02

Robustness to noise variations

03

Outperforms standard Gaussian-based models

Abstract

We address speech enhancement based on variational autoencoders, which involves learning a speech prior distribution in the time-frequency (TF) domain. A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable. In contrast to this commonly used approach, we propose a weighted variance generative model, where the contribution of each spectrogram time-frame in parameter learning is weighted. We impose a Gamma prior distribution on the weights, which would effectively lead to a Student's t-distribution instead of Gaussian for speech generative modeling. We develop efficient training and speech enhancement algorithms based on the proposed generative model. Our experimental results on spectrogram auto-encoding and speech enhancement demonstrate the effectiveness and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques