Speech enhancement with frequency domain auto-regressive modeling

Anurenjan Purushothaman; Debottam Dutta; Rohit Kumar; Sriram; Ganapathy

arXiv:2309.13537·eess.AS·September 26, 2023

Speech enhancement with frequency domain auto-regressive modeling

Anurenjan Purushothaman, Debottam Dutta, Rohit Kumar, Sriram, Ganapathy

PDF

TL;DR

This paper introduces a novel frequency domain autoregressive model with a dual-path LSTM architecture for joint speech dereverberation and ASR enhancement, significantly improving speech quality and recognition accuracy in reverberant environments.

Contribution

It proposes a unified framework using envelope-carrier decomposition and a dual-path LSTM for joint dereverberation and ASR, which is a novel approach in this domain.

Findings

01

Achieved 10-24% relative improvement in ASR performance over baseline models.

02

Demonstrated significant subjective speech quality enhancements.

03

Validated effectiveness on REVERB and VOiCES datasets.

Abstract

Speech applications in far-field real world settings often deal with signals that are corrupted by reverberation. The task of dereverberation constitutes an important step to improve the audible quality and to reduce the error rates in applications like automatic speech recognition (ASR). We propose a unified framework of speech dereverberation for improving the speech quality and the ASR performance using the approach of envelope-carrier decomposition provided by an autoregressive (AR) model. The AR model is applied in the frequency domain of the sub-band speech signals to separate the envelope and carrier parts. A novel neural architecture based on dual path long short term memory (DPLSTM) model is proposed, which jointly enhances the sub-band envelope and carrier components. The dereverberated envelope-carrier signals are modulated and the sub-band signals are synthesized to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.