TL;DR
This paper introduces IR-GAN, a GAN-based method for generating realistic synthetic room impulse responses to enhance far-field speech recognition accuracy in new environments.
Contribution
IR-GAN is a novel GAN-based approach that synthesizes realistic RIRs by extracting acoustic parameters, improving speech recognition performance in unseen environments.
Findings
IR-GAN-generated RIRs outperform GAS in recognition accuracy
Synthetic RIRs reduce word error rate by up to 14.3%
Combining IR-GAN RIRs with GAS further improves results
Abstract
We present a Generative Adversarial Network (GAN) based room impulse response generator (IR-GAN) for generating realistic synthetic room impulse responses (RIRs). IR-GAN extracts acoustic parameters from captured real-world RIRs and uses these parameters to generate new synthetic RIRs. We use these generated synthetic RIRs to improve far-field automatic speech recognition in new environments that are different from the ones used in training datasets. In particular, we augment the far-field speech training set by convolving our synthesized RIRs with a clean LibriSpeech dataset. We evaluate the quality of our synthetic RIRs on the real-world LibriSpeech test set created using real-world RIRs from the BUT ReverbDB and AIR datasets. Our IR-GAN reports up to an 8.95% lower error rate than Geometric Acoustic Simulator (GAS) in far-field speech recognition benchmarks. We further improve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
