SRP-PHAT-NET: A Reliability-Driven DNN for Reverberant Speaker Localization
Bar Shaybet, Vladimir Tourbabin, Boaz Rafaely

TL;DR
This paper introduces SRP-PHAT-NET, a deep neural network that estimates speaker direction-of-arrival in reverberant environments and assesses the reliability of its predictions, improving practical localization accuracy.
Contribution
The work presents a novel DNN framework with built-in reliability estimation for DOA, using Gaussian-weighted labels and analyzing label smoothing effects.
Findings
Reliability-driven predictions improve localization accuracy.
Gaussian label smoothing influences accuracy and reliability.
Selective use of high-confidence predictions enhances results.
Abstract
Accurate Direction-of-Arrival (DOA) estimation in reverberant environments remains a fundamental challenge for spatial audio applications. While deep learning methods have shown strong performance in such conditions, they typically lack a mechanism to assess the reliability of their predictions - an essential feature for real-world deployment. In this work, we present the SRP-PHAT-NET, a deep neural network framework that leverages SRP-PHAT directional maps as spatial features and introduces a built-in reliability estimation. To enable meaningful reliability scoring, the model is trained using Gaussian-weighted labels centered around the true direction. We systematically analyze the influence of label smoothing on accuracy and reliability, demonstrating that the choice of Gaussian kernel width can be tuned to application-specific requirements. Experimental results show that selectively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
