RIR-Mega-Speech: A Reverberant Speech Corpus with Comprehensive Acoustic Metadata and Reproducible Evaluation

Mandip Goswami

arXiv:2601.19949·eess.AS·January 29, 2026

RIR-Mega-Speech: A Reverberant Speech Corpus with Comprehensive Acoustic Metadata and Reproducible Evaluation

Mandip Goswami

PDF

Open Access 2 Datasets

TL;DR

The paper introduces RIR-Mega-Speech, a comprehensive reverberant speech corpus with detailed acoustic metadata and reproducible evaluation procedures, facilitating standardized research and comparison in reverberant speech recognition.

Contribution

It provides a large, annotated reverberant speech dataset with reproducible creation and evaluation scripts, addressing reproducibility issues in reverberant speech research.

Findings

01

Reverberation increases WER by approximately 2.5 percentage points.

02

WER correlates positively with RT60 and negatively with DRR.

03

Reverberant speech recognition performance degrades predictably with acoustic reverberation.

Abstract

Despite decades of research on reverberant speech, comparing methods remains difficult because most corpora lack per-file acoustic annotations or provide limited documentation for reproduction. We present RIR-Mega-Speech, a corpus of approximately 117.5 hours created by convolving LibriSpeech utterances with roughly 5,000 simulated room impulse responses from the RIR-Mega collection. Every file includes RT60, direct-to-reverberant ratio (DRR), and clarity index ( $C_{50}$ ) computed from the source RIR using clearly defined, reproducible procedures. We also provide scripts to rebuild the dataset and reproduce all evaluation results. Using Whisper small on 1,500 paired utterances, we measure 5.20% WER (95% CI: 4.69--5.78) on clean speech and 7.70% (7.04--8.35) on reverberant versions, corresponding to a paired increase of 2.50 percentage points (2.06--2.98). This represents a 48%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing