Speech Dereverberation with Context-aware Recurrent Neural Networks
Joao Felipe Santos, Tiago H. Falk

TL;DR
This paper introduces a context-aware recurrent neural network model for speech dereverberation that effectively captures short and long-term dependencies, outperforming existing models in objective speech quality metrics and perceptual tests.
Contribution
The paper presents a novel RNN-based model that leverages both short and long-term context for speech dereverberation without additional inputs, improving performance over prior methods.
Findings
Outperforms recent models in PESQ, STOI, and POLQA metrics.
Generalizes well to real room impulse responses and unseen speakers.
Listening tests favor the proposed method over benchmarks.
Abstract
In this paper, we propose a model to perform speech dereverberation by estimating its spectral magnitude from the reverberant counterpart. Our models are capable of extracting features that take into account both short and long-term dependencies in the signal through a convolutional encoder (which extracts features from a short, bounded context of frames) and a recurrent neural network for extracting long-term information. Our model outperforms a recently proposed model that uses different context information depending on the reverberation time, without requiring any sort of additional input, yielding improvements of up to 0.4 on PESQ, 0.3 on STOI, and 1.0 on POLQA relative to reverberant speech. We also show our model is able to generalize to real room impulse responses even when only trained with simulated room impulse responses, different speakers, and high reverberation times.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
