Treble10: A high-quality dataset for far-field speech recognition, dereverberation, and enhancement

Sarabeth S. Mullins; Georg G\"otz; Eric Bezzam; Steven Zheng; Daniel Gert Nielsen

arXiv:2510.23141·eess.AS·October 28, 2025

Treble10: A high-quality dataset for far-field speech recognition, dereverberation, and enhancement

Sarabeth S. Mullins, Georg G\"otz, Eric Bezzam, Steven Zheng, Daniel Gert Nielsen

PDF

2 Datasets

TL;DR

Treble10 is a large, physically accurate dataset of room impulse responses and reverberant speech, designed to improve far-field speech recognition and enhancement by bridging the gap between measured and simulated data.

Contribution

It introduces Treble10, a comprehensive, high-fidelity room-acoustic dataset combining wave-based and geometrical acoustics simulations for diverse far-field speech applications.

Findings

01

Provides over 3000 simulated RIRs in real rooms

02

Includes paired reverberant and clean speech data

03

Enables reproducible evaluation and data augmentation

Abstract

Accurate far-field speech datasets are critical for tasks such as automatic speech recognition (ASR), dereverberation, speech enhancement, and source separation. However, current datasets are limited by the trade-off between acoustic realism and scalability. Measured corpora provide faithful physics but are expensive, low-coverage, and rarely include paired clean and reverberant data. In contrast, most simulation-based datasets rely on simplified geometrical acoustics, thus failing to reproduce key physical phenomena like diffraction, scattering, and interference that govern sound propagation in complex environments. We introduce Treble10, a large-scale, physically accurate room-acoustic dataset. Treble10 contains over 3000 broadband room impulse responses (RIRs) simulated in 10 fully furnished real-world rooms, using a hybrid simulation paradigm implemented in the Treble SDK that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.