The importance of spatial and spectral information in multiple speaker tracking
Hanan Beit-On, Vladimir Tourbabin, Boaz Rafaely

TL;DR
This paper introduces a joint spatial-spectral data association method for multi-speaker tracking, improving accuracy by integrating time-frequency masks with spatial information, demonstrated on LOCATA challenge recordings.
Contribution
It presents a novel JPDA-based approach that combines spectral and spatial cues for better speaker association in multi-speaker tracking.
Findings
Enhanced tracking accuracy with joint spatial-spectral information
Improved association performance on LOCATA recordings
Outperforms methods relying on single information channels
Abstract
Multi-speaker localization and tracking using microphone array recording is of importance in a wide range of applications. One of the challenges with multi-speaker tracking is to associate direction estimates with the correct speaker. Most existing association approaches rely on spatial or spectral information alone, leading to performance degradation when one of these information channels is partially known or missing. This paper studies a joint probability data association (JPDA)-based method that facilitates association based on joint spatial-spectral information. This is achieved by integrating speaker time-frequency (TF) masks, estimated based on spectral information, in the association probabilities calculation. An experimental study that tested the proposed method on recordings from the LOCATA challenge demonstrates the enhanced performance obtained by using joint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing
