Towards Improved Room Impulse Response Estimation for Speech Recognition

Anton Ratnarajah; Ishwarya Ananthabhotla; Vamsi Krishna Ithapu; Pablo; Hoffmann; Dinesh Manocha; Paul Calamia

arXiv:2211.04473·cs.SD·March 21, 2023

Towards Improved Room Impulse Response Estimation for Speech Recognition

Anton Ratnarajah, Ishwarya Ananthabhotla, Vamsi Krishna Ithapu, Pablo, Hoffmann, Dinesh Manocha, Paul Calamia

PDF

Open Access 1 Repo

TL;DR

This paper introduces a GAN-based method for blind room impulse response estimation that improves speech recognition accuracy by better capturing reverberation characteristics, outperforming existing methods on benchmarks.

Contribution

A novel GAN architecture with energy decay relief loss for improved blind RIR estimation tailored for speech recognition applications.

Findings

01

Outperforms state-of-the-art baselines by 17% on energy decay relief

02

Achieves 22% improvement on early-reflection energy metric

03

Reduces word error rate in ASR by 6.9%

Abstract

We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of a downstream application scenario, far-field automatic speech recognition (ASR). We first draw the connection between improved RIR estimation and improved ASR performance, as a means of evaluating neural RIR estimators. We then propose a generative adversarial network (GAN) based architecture that encodes RIR features from reverberant speech and constructs an RIR from the encoded features, and uses a novel energy decay relief loss to optimize for capturing energy-based properties of the input reverberant speech. We show that our model outperforms the state-of-the-art baselines on acoustic benchmarks (by 17\% on the energy decay relief and 22\% on an early-reflection energy metric), as well as in an ASR evaluation task (by 6.9\% in word error rate).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anton-jeran/Speech2RIR
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing