Position tracking of a varying number of sound sources with sliding   permutation invariant training

David Diaz-Guerra; Archontis Politis; Tuomas Virtanen

arXiv:2210.14536·eess.AS·June 6, 2023·1 cites

Position tracking of a varying number of sound sources with sliding permutation invariant training

David Diaz-Guerra, Archontis Politis, Tuomas Virtanen

PDF

Open Access

TL;DR

This paper introduces a novel training strategy for deep learning sound source localization models that effectively tracks multiple moving sources with varying numbers, reducing identity switches while maintaining localization accuracy.

Contribution

It proposes a straightforward mean squared error-based training method that handles time-varying source counts and preserves source identities across frames.

Findings

01

Reduces identity switches in multi-source tracking

02

Maintains high frame-wise localization accuracy

03

Effective on simulated reverberant moving sources

Abstract

Recent data- and learning-based sound source localization (SSL) methods have shown strong performance in challenging acoustic scenarios. However, little work has been done on adapting such methods to track consistently multiple sources appearing and disappearing, as would occur in reality. In this paper, we present a new training strategy for deep learning SSL models with a straightforward implementation based on the mean squared error of the optimal association between estimated and reference positions in the preceding time frames. It optimizes the desired properties of a tracking system: handling a time-varying number of sources and ordering localization estimates according to their trajectories, minimizing identity switches (IDSs). Evaluation on simulated data of multiple reverberant moving sources and on two model architectures proves its effectiveness on reducing identity switches…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Underwater Acoustics Research