End-to-End Speech Recognition With Joint Dereverberation Of Sub-Band   Autoregressive Envelopes

Rohit Kumar; Anurenjan Purushothaman; Anirudh Sreeram; Sriram; Ganapathy

arXiv:2108.03975·eess.AS·February 21, 2022·ICASSP

End-to-End Speech Recognition With Joint Dereverberation Of Sub-Band Autoregressive Envelopes

Rohit Kumar, Anurenjan Purushothaman, Anirudh Sreeram, Sriram, Ganapathy

PDF

Open Access 1 Repo

TL;DR

This paper introduces a neural envelope enhancement model for end-to-end speech recognition in reverberant environments, jointly optimized with ASR to improve recognition accuracy by dereverberating sub-band temporal envelopes.

Contribution

It proposes a novel joint optimization framework combining envelope dereverberation and E2E ASR, utilizing a neural model based on FDLP and transformer architecture.

Findings

01

Significant relative improvements on REVERB challenge dataset (21%)

02

Notable gains on VOiCES dataset (10%)

03

Effective joint modeling of dereverberation and recognition

Abstract

The end-to-end (E2E) automatic speech recognition (ASR) systems are often required to operate in reverberant conditions, where the long-term sub-band envelopes of the speech are temporally smeared. In this paper, we develop a feature enhancement approach using a neural model operating on sub-band temporal envelopes. The temporal envelopes are modeled using the framework of frequency domain linear prediction (FDLP). The neural enhancement model proposed in this paper performs an envelope gain based enhancement of temporal envelopes. The model architecture consists of a combination of convolutional and long short term memory (LSTM) neural network layers. Further, the envelope dereverberation, feature extraction and acoustic modeling using transformer based E2E ASR can all be jointly optimized for the speech recognition task. The joint optimization ensures that the dereverberation model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iiscleap/joint_fdlp_envelope_dereverberation_e2e_asr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing