IR-GAN: Room Impulse Response Generator for Far-field Speech Recognition

Anton Ratnarajah; Zhenyu Tang; Dinesh Manocha

arXiv:2010.13219·cs.SD·April 8, 2021

IR-GAN: Room Impulse Response Generator for Far-field Speech Recognition

Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha

PDF

1 Repo

TL;DR

This paper introduces IR-GAN, a GAN-based method for generating realistic synthetic room impulse responses to enhance far-field speech recognition accuracy in new environments.

Contribution

IR-GAN is a novel GAN-based approach that synthesizes realistic RIRs by extracting acoustic parameters, improving speech recognition performance in unseen environments.

Findings

01

IR-GAN-generated RIRs outperform GAS in recognition accuracy

02

Synthetic RIRs reduce word error rate by up to 14.3%

03

Combining IR-GAN RIRs with GAS further improves results

Abstract

We present a Generative Adversarial Network (GAN) based room impulse response generator (IR-GAN) for generating realistic synthetic room impulse responses (RIRs). IR-GAN extracts acoustic parameters from captured real-world RIRs and uses these parameters to generate new synthetic RIRs. We use these generated synthetic RIRs to improve far-field automatic speech recognition in new environments that are different from the ones used in training datasets. In particular, we augment the far-field speech training set by convolving our synthesized RIRs with a clean LibriSpeech dataset. We evaluate the quality of our synthetic RIRs on the real-world LibriSpeech test set created using real-world RIRs from the BUT ReverbDB and AIR datasets. Our IR-GAN reports up to an 8.95% lower error rate than Geometric Acoustic Simulator (GAS) in far-field speech recognition benchmarks. We further improve the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GAMMA-UMD/IR-GAN
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.