Multiple-Speaker Localization Based on Direct-Path Features and Likelihood Maximization with Spatial Sparsity Regularization
Xiaofei Li, Laurent Girin, Sharon Gannot, Radu Horaud

TL;DR
This paper presents a robust multi-speaker localization method in noisy, reverberant environments using binaural features, a GMM model with sparsity regularization, and an extension of DP-RTF estimation for multiple sources, validated through simulations and real data.
Contribution
It introduces a novel multi-source localization approach combining direct-path features, likelihood maximization with spatial sparsity, and an extended DP-RTF estimation for multiple speakers.
Findings
Effective in noisy, reverberant conditions
Accurate estimation of number and locations of speakers
Validated with both simulated and real data
Abstract
This paper addresses the problem of multiple-speaker localization in noisy and reverberant environments, using binaural recordings of an acoustic scene. A Gaussian mixture model (GMM) is adopted, whose components correspond to all the possible candidate source locations defined on a grid. After optimizing the GMM-based objective function, given an observed set of binaural features, both the number of sources and their locations are estimated by selecting the GMM components with the largest priors. This is achieved by enforcing a sparse solution, thus favoring a small number of speakers with respect to the large number of initial candidate source locations. An entropy-based penalty term is added to the likelihood, thus imposing sparsity over the set of GMM priors. In addition, the direct-path relative transfer function (DP-RTF) is used to build robust binaural features. The DP-RTF,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
