Multiple-Speaker Localization Based on Direct-Path Features and   Likelihood Maximization with Spatial Sparsity Regularization

Xiaofei Li; Laurent Girin; Sharon Gannot; Radu Horaud

arXiv:1611.01172·cs.SD·October 6, 2017

Multiple-Speaker Localization Based on Direct-Path Features and Likelihood Maximization with Spatial Sparsity Regularization

Xiaofei Li, Laurent Girin, Sharon Gannot, Radu Horaud

PDF

TL;DR

This paper presents a robust multi-speaker localization method in noisy, reverberant environments using binaural features, a GMM model with sparsity regularization, and an extension of DP-RTF estimation for multiple sources, validated through simulations and real data.

Contribution

It introduces a novel multi-source localization approach combining direct-path features, likelihood maximization with spatial sparsity, and an extended DP-RTF estimation for multiple speakers.

Findings

01

Effective in noisy, reverberant conditions

02

Accurate estimation of number and locations of speakers

03

Validated with both simulated and real data

Abstract

This paper addresses the problem of multiple-speaker localization in noisy and reverberant environments, using binaural recordings of an acoustic scene. A Gaussian mixture model (GMM) is adopted, whose components correspond to all the possible candidate source locations defined on a grid. After optimizing the GMM-based objective function, given an observed set of binaural features, both the number of sources and their locations are estimated by selecting the GMM components with the largest priors. This is achieved by enforcing a sparse solution, thus favoring a small number of speakers with respect to the large number of initial candidate source locations. An entropy-based penalty term is added to the likelihood, thus imposing sparsity over the set of GMM priors. In addition, the direct-path relative transfer function (DP-RTF) is used to build robust binaural features. The DP-RTF,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.