Tracking of Intermittent and Moving Speakers : Dataset and Metrics
Taous Iatariene (MULTISPEECH), Alexandre Gu\'erin, Romain Serizel (MULTISPEECH)

TL;DR
This paper addresses the challenge of tracking intermittent and moving speakers, introducing a new dataset and metrics to evaluate track identity maintenance during discontinuous and dynamic speaker movements.
Contribution
It introduces LibriJump, a novel dataset for speaker tracking with changing positions during silence, and proposes adapted association metrics for better performance evaluation.
Findings
Association metrics complement existing tracking metrics.
Discontinuous tracks pose unique challenges for speaker tracking.
LibriJump enables evaluation of tracking methods on dynamic, intermittent speaker movements.
Abstract
This paper presents the problem of tracking intermittent and moving sources, i.e, sources that may change position when they are inactive. This issue is seldom explored, and most current tracking methods rely on spatial observations for track identity management. They are either based on a previous localization step, or designed to perform joint localization and tracking by predicting ordered position estimates. This raises concerns about whether such methods can maintain reliable track identity assignment performance when dealing with discontinuous spatial tracks, which may be caused by a change of direction during silence. We introduce LibriJump, a novel dataset of acoustic scenes in the First Order Ambisonics format focusing on speaker tracking. The dataset contains speakers with changing positions during inactivity periods, thus simulating discontinuous tracks. To measure the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing
