Dependence on Early and Late Reverberation of Single-Channel Speaker Distance Estimation
Michael Neri, Archontis Politis, Tuomas Virtanen

TL;DR
This study investigates how different components of room impulse responses affect single-channel speaker distance estimation accuracy under various calibration scenarios, highlighting the importance of early reflections and calibration.
Contribution
It introduces a decomposition of RIRs into early and late components and evaluates their impact on distance estimation across calibration conditions.
Findings
Early reflections are the most informative component for distance estimation.
Without calibration, MAE increases to 1.29 m, indicating reduced accuracy.
With calibration, the model achieves 0.14 m MAE by using propagation delay.
Abstract
Single-channel speaker distance estimation has recently achieved centimeter-level accuracy in simulated environments, yet it remains unclear which components of the room impulse response (RIR) the model exploits and how performance depends on the recording conditions. In this work, we decompose simulated RIRs into four variants (full, direct-only, no-late, and no-early) using the mixing time estimated from the echo density function as the boundary between early reflections and late reverberation. We define four calibration scenarios, from fully calibrated (synchronised capture, known source level) to fully uncalibrated (arbitrary onset, unknown level), and evaluate all combinations on a matched dataset. Results show that without time calibration, mean absolute error (MAE) increases to m and the model extracts reverberation-based cues, with early reflections emerging as the most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
