RVAE-EM: Generative speech dereverberation based on recurrent   variational auto-encoder and convolutive transfer function

Pengyu Wang; Xiaofei Li

arXiv:2309.08157·eess.AS·October 18, 2023

RVAE-EM: Generative speech dereverberation based on recurrent variational auto-encoder and convolutive transfer function

Pengyu Wang, Xiaofei Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a generative speech dereverberation method combining recurrent variational auto-encoder and convolutive transfer function, achieving superior performance in indoor reverberant environments.

Contribution

It presents a novel probabilistic model that uses RVAE as a prior and employs MAP estimation via EM for effective dereverberation, outperforming existing discriminative methods.

Findings

01

Outperforms advanced discriminative networks in dereverberation tasks

02

Integrates network-based speech prior with CTF observation model

03

Effective in single-channel indoor reverberant speech scenarios

Abstract

In indoor scenes, reverberation is a crucial factor in degrading the perceived quality and intelligibility of speech. In this work, we propose a generative dereverberation method. Our approach is based on a probabilistic model utilizing a recurrent variational auto-encoder (RVAE) network and the convolutive transfer function (CTF) approximation. Different from most previous approaches, the output of our RVAE serves as the prior of the clean speech. And our target is the maximum a posteriori (MAP) estimation of clean speech, which is achieved iteratively through the expectation maximization (EM) algorithm. The proposed method integrates the capabilities of network-based speech prior modelling and CTF-based observation modelling. Experiments on single-channel speech dereverberation show that the proposed generative method noticeably outperforms the advanced discriminative networks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

audio-westlakeu/rvae-em
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Indoor and Outdoor Localization Technologies