Reverberation-Robust Localization of Speakers Using Distinct Speech Onsets and Multi-channel Cross-Correlations

Shoufeng Lin

arXiv:2604.01524·eess.AS·April 3, 2026

Reverberation-Robust Localization of Speakers Using Distinct Speech Onsets and Multi-channel Cross-Correlations

Shoufeng Lin

PDF

TL;DR

This paper introduces two novel algorithms for localizing speakers in reverberant environments, leveraging speech onsets and multi-microphone correlations to improve accuracy under challenging acoustic conditions.

Contribution

The paper presents new algorithms that enhance speaker localization robustness in reverberant settings by using speech onset detection and multi-channel cross-correlation techniques.

Findings

01

Algorithms reliably locate static and moving speakers in reverberant rooms.

02

Proposed methods outperform some state-of-the-art localization techniques.

03

Effective in both simulated and real reverberant environments.

Abstract

Many speaker localization methods can be found in the literature. However, speaker localization under strong reverberation still remains a major challenge in the real-world applications. This paper proposes two algorithms for localizing speakers using microphone array recordings of reverberated sounds. To separate concurrent speakers, the first algorithm decomposes microphone signals spectrotemporally into subbands via an auditory filterbank. To suppress reverberation, we propose a novel speech onset detection approach derived from the speech signal and impulse response models, and further propose to formulate the multi-channel cross-correlation coefficient (MCCC) of encoded speech onsets in each subband. The subband results are combined to estimate the directions-of-arrival (DOAs) of speakers. The second algorithm extends the generalized cross-correlation - phase transform (GCC-PHAT)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.