Towards Improving Speaker Distance Estimation through Generative Impulse Response Augmentation
Anton Ratnarajah, Mehmet Ergezer, Arun Nair, Mrudula Athi

TL;DR
This paper presents a data augmentation method using generated room impulse responses to enhance speaker distance estimation accuracy in acoustic environments, achieving significant error reduction.
Contribution
It introduces a novel RIR augmentation technique with quality filtering and hyperparameter tuning to improve SDE model performance.
Findings
MAE reduced from 1.66m to 0.6m in GWA rooms
MAE reduced from 2.18m to 0.69m in Treble rooms
Augmentation significantly improves medium to long-distance estimation
Abstract
The Room Acoustics and Speaker Distance Estimation (SDE) Challenge at ICASSP 2025 explores the effectiveness of augmented room impulse response (RIR) data for improving SDE model performance. This challenge at GenDARA involves generating RIRs to supplement sparse datasets and fine-tuning SDE models with the augmented data. We employ the open-source fast diffuse room impulse response generator (FastRIR) conditioned only on speaker and listener locations. We design a quality filter to ensure generated RIR alignment with challenge RIRs, and hyperparameter optimization is employed for model fine-tuning. Our approach reduces the mean absolute error (MAE) of the five positions from 1.66m to 0.6m for GWA rooms and from 2.18m to 0.69m for Treble rooms, with results demonstrating that the augmentation approach significantly improves estimation accuracy, particularly at medium to long distances.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
