End-to-End Speech Recognition With Joint Dereverberation Of Sub-Band Autoregressive Envelopes
Rohit Kumar, Anurenjan Purushothaman, Anirudh Sreeram, Sriram, Ganapathy

TL;DR
This paper introduces a neural envelope enhancement model for end-to-end speech recognition in reverberant environments, jointly optimized with ASR to improve recognition accuracy by dereverberating sub-band temporal envelopes.
Contribution
It proposes a novel joint optimization framework combining envelope dereverberation and E2E ASR, utilizing a neural model based on FDLP and transformer architecture.
Findings
Significant relative improvements on REVERB challenge dataset (21%)
Notable gains on VOiCES dataset (10%)
Effective joint modeling of dereverberation and recognition
Abstract
The end-to-end (E2E) automatic speech recognition (ASR) systems are often required to operate in reverberant conditions, where the long-term sub-band envelopes of the speech are temporally smeared. In this paper, we develop a feature enhancement approach using a neural model operating on sub-band temporal envelopes. The temporal envelopes are modeled using the framework of frequency domain linear prediction (FDLP). The neural enhancement model proposed in this paper performs an envelope gain based enhancement of temporal envelopes. The model architecture consists of a combination of convolutional and long short term memory (LSTM) neural network layers. Further, the envelope dereverberation, feature extraction and acoustic modeling using transformer based E2E ASR can all be jointly optimized for the speech recognition task. The joint optimization ensures that the dereverberation model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
