Speaker Embeddings to Improve Tracking of Intermittent and Moving Speakers
Taous Iatariene (MULTISPEECH), Can Cui (MULTISPEECH), Alexandre Gu\'erin, Romain Serizel (MULTISPEECH)

TL;DR
This paper introduces a speaker embedding-based method to improve speaker tracking accuracy during intermittent and moving speaker scenarios by reassigning identities post-tracking using enhanced audio signals.
Contribution
It proposes a novel approach combining beamforming and speaker embeddings for identity reassignment, addressing limitations of traditional spatial observation-based tracking methods.
Findings
Improves identity assignment performance in moving and intermittent speaker scenarios
Beamforming enhances speaker embedding quality and tracking accuracy
Effective for both neural and standard tracking systems
Abstract
Speaker tracking methods often rely on spatial observations to assign coherent track identities over time. This raises limits in scenarios with intermittent and moving speakers, i.e., speakers that may change position when they are inactive, thus leading to discontinuous spatial trajectories. This paper proposes to investigate the use of speaker embeddings, in a simple solution to this issue. We propose to perform identity reassignment post-tracking, using speaker embeddings. We leverage trajectory-related information provided by an initial tracking step and multichannel audio signal. Beamforming is used to enhance the signal towards the speakers' positions in order to compute speaker embeddings. These are then used to assign new track identities based on an enrollment pool. We evaluate the performance of the proposed speaker embedding-based identity reassignment method on a dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
