RIR-Mega-Speech: A Reverberant Speech Corpus with Comprehensive Acoustic Metadata and Reproducible Evaluation
Mandip Goswami

TL;DR
The paper introduces RIR-Mega-Speech, a comprehensive reverberant speech corpus with detailed acoustic metadata and reproducible evaluation procedures, facilitating standardized research and comparison in reverberant speech recognition.
Contribution
It provides a large, annotated reverberant speech dataset with reproducible creation and evaluation scripts, addressing reproducibility issues in reverberant speech research.
Findings
Reverberation increases WER by approximately 2.5 percentage points.
WER correlates positively with RT60 and negatively with DRR.
Reverberant speech recognition performance degrades predictably with acoustic reverberation.
Abstract
Despite decades of research on reverberant speech, comparing methods remains difficult because most corpora lack per-file acoustic annotations or provide limited documentation for reproduction. We present RIR-Mega-Speech, a corpus of approximately 117.5 hours created by convolving LibriSpeech utterances with roughly 5,000 simulated room impulse responses from the RIR-Mega collection. Every file includes RT60, direct-to-reverberant ratio (DRR), and clarity index () computed from the source RIR using clearly defined, reproducible procedures. We also provide scripts to rebuild the dataset and reproduce all evaluation results. Using Whisper small on 1,500 paired utterances, we measure 5.20% WER (95% CI: 4.69--5.78) on clean speech and 7.70% (7.04--8.35) on reverberant versions, corresponding to a paired increase of 2.50 percentage points (2.06--2.98). This represents a 48%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
