Speaker Embeddings to Improve Tracking of Intermittent and Moving Speakers

Taous Iatariene (MULTISPEECH); Can Cui (MULTISPEECH); Alexandre Gu\'erin; Romain Serizel (MULTISPEECH)

arXiv:2506.19875·eess.AS·June 26, 2025

Speaker Embeddings to Improve Tracking of Intermittent and Moving Speakers

Taous Iatariene (MULTISPEECH), Can Cui (MULTISPEECH), Alexandre Gu\'erin, Romain Serizel (MULTISPEECH)

PDF

TL;DR

This paper introduces a speaker embedding-based method to improve speaker tracking accuracy during intermittent and moving speaker scenarios by reassigning identities post-tracking using enhanced audio signals.

Contribution

It proposes a novel approach combining beamforming and speaker embeddings for identity reassignment, addressing limitations of traditional spatial observation-based tracking methods.

Findings

01

Improves identity assignment performance in moving and intermittent speaker scenarios

02

Beamforming enhances speaker embedding quality and tracking accuracy

03

Effective for both neural and standard tracking systems

Abstract

Speaker tracking methods often rely on spatial observations to assign coherent track identities over time. This raises limits in scenarios with intermittent and moving speakers, i.e., speakers that may change position when they are inactive, thus leading to discontinuous spatial trajectories. This paper proposes to investigate the use of speaker embeddings, in a simple solution to this issue. We propose to perform identity reassignment post-tracking, using speaker embeddings. We leverage trajectory-related information provided by an initial tracking step and multichannel audio signal. Beamforming is used to enhance the signal towards the speakers' positions in order to compute speaker embeddings. These are then used to assign new track identities based on an enrollment pool. We evaluate the performance of the proposed speaker embedding-based identity reassignment method on a dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.