Speech enhancement with frequency domain auto-regressive modeling
Anurenjan Purushothaman, Debottam Dutta, Rohit Kumar, Sriram, Ganapathy

TL;DR
This paper introduces a novel frequency domain autoregressive model with a dual-path LSTM architecture for joint speech dereverberation and ASR enhancement, significantly improving speech quality and recognition accuracy in reverberant environments.
Contribution
It proposes a unified framework using envelope-carrier decomposition and a dual-path LSTM for joint dereverberation and ASR, which is a novel approach in this domain.
Findings
Achieved 10-24% relative improvement in ASR performance over baseline models.
Demonstrated significant subjective speech quality enhancements.
Validated effectiveness on REVERB and VOiCES datasets.
Abstract
Speech applications in far-field real world settings often deal with signals that are corrupted by reverberation. The task of dereverberation constitutes an important step to improve the audible quality and to reduce the error rates in applications like automatic speech recognition (ASR). We propose a unified framework of speech dereverberation for improving the speech quality and the ASR performance using the approach of envelope-carrier decomposition provided by an autoregressive (AR) model. The AR model is applied in the frequency domain of the sub-band speech signals to separate the envelope and carrier parts. A novel neural architecture based on dual path long short term memory (DPLSTM) model is proposed, which jointly enhances the sub-band envelope and carrier components. The dereverberated envelope-carrier signals are modulated and the sub-band signals are synthesized to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
