Whisper-RIR-Mega: A Paired Clean-Reverberant Speech Benchmark for ASR Robustness to Room Acoustics

Mandip Goswami

arXiv:2603.02252·eess.AS·March 17, 2026

Whisper-RIR-Mega: A Paired Clean-Reverberant Speech Benchmark for ASR Robustness to Room Acoustics

Mandip Goswami

PDF

Open Access 1 Models 1 Datasets

TL;DR

Whisper-RIR-Mega is a new benchmark dataset pairing clean and reverberant speech to evaluate ASR robustness to room acoustics, revealing performance degradation due to reverberation across different Whisper models.

Contribution

Introduces a comprehensive paired speech dataset with real room impulse responses for evaluating and benchmarking ASR robustness to reverberation effects.

Findings

01

Reverberation degrades ASR performance across all models.

02

Larger models like Whisper-large-v3 are more robust to reverberation.

03

Reverberation increases word error rate by 2.31 to 15.50 percentage points.

Abstract

We introduce Whisper-RIR-Mega, a benchmark dataset of paired clean and reverberant speech for evaluating automatic speech recognition (ASR) robustness to room acoustics. Each sample pairs a clean LibriSpeech utterance with the same utterance convolved with a real room impulse response from the RIR-Mega corpus, with stratified splits by reverberation time (RT60) and direct-to-reverberant ratio (DRR). We evaluate five Whisper models (tiny through large-v3) on 1600 test samples and report word error rate (WER) and character error rate (CER) under clean and reverberant conditions. Reverberation consistently degrades performance across all model sizes; the reverb penalty in WER ranges from 2.31 to 15.50 percentage points depending on the model. Whisper-large-v3 shows the smallest penalty; Whisper-tiny shows the largest. We release the dataset, evaluation code, and baseline results to support…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
mandipgoswami/whisper-medium-rirmega
model· 52 dl· ♡ 1
52 dl♡ 1

Datasets

mandipgoswami/whisper-rirmega-bench
dataset· 67 dl
67 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Face recognition and analysis