RVAE-EM: Generative speech dereverberation based on recurrent variational auto-encoder and convolutive transfer function
Pengyu Wang, Xiaofei Li

TL;DR
This paper introduces a generative speech dereverberation method combining recurrent variational auto-encoder and convolutive transfer function, achieving superior performance in indoor reverberant environments.
Contribution
It presents a novel probabilistic model that uses RVAE as a prior and employs MAP estimation via EM for effective dereverberation, outperforming existing discriminative methods.
Findings
Outperforms advanced discriminative networks in dereverberation tasks
Integrates network-based speech prior with CTF observation model
Effective in single-channel indoor reverberant speech scenarios
Abstract
In indoor scenes, reverberation is a crucial factor in degrading the perceived quality and intelligibility of speech. In this work, we propose a generative dereverberation method. Our approach is based on a probabilistic model utilizing a recurrent variational auto-encoder (RVAE) network and the convolutive transfer function (CTF) approximation. Different from most previous approaches, the output of our RVAE serves as the prior of the clean speech. And our target is the maximum a posteriori (MAP) estimation of clean speech, which is achieved iteratively through the expectation maximization (EM) algorithm. The proposed method integrates the capabilities of network-based speech prior modelling and CTF-based observation modelling. Experiments on single-channel speech dereverberation show that the proposed generative method noticeably outperforms the advanced discriminative networks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Indoor and Outdoor Localization Technologies
