Differentiable Tracking-Based Training of Deep Learning Sound Source   Localizers

Sharath Adavanne; Archontis Politis; Tuomas Virtanen

arXiv:2111.00030·eess.AS·November 2, 2021

Differentiable Tracking-Based Training of Deep Learning Sound Source Localizers

Sharath Adavanne, Archontis Politis, Tuomas Virtanen

PDF

Open Access 2 Repos

TL;DR

This paper introduces a differentiable training method for deep learning sound source localizers that directly optimizes tracking metrics, improving multi-source localization and tracking without auxiliary information.

Contribution

It adapts a differentiable network approach from video object detection to sound source localization, enabling end-to-end training for multi-source scenarios.

Findings

01

Significant reduction in localization error.

02

Improved detection and tracking metrics.

03

Enhanced multi-source tracking capabilities.

Abstract

Data-based and learning-based sound source localization (SSL) has shown promising results in challenging conditions, and is commonly set as a classification or a regression problem. Regression-based approaches have certain advantages over classification-based, such as continuous direction-of-arrival estimation of static and moving sources. However, multi-source scenarios require multiple regressors without a clear training strategy up-to-date, that does not rely on auxiliary information such as simultaneous sound classification. We investigate end-to-end training of such methods with a technique recently proposed for video object detectors, adapted to the SSL setting. A differentiable network is constructed that can be plugged to the output of the localizer to solve the optimal assignment between predictions and references, optimizing directly the popular CLEAR-MOT tracking metrics.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Underwater Acoustics Research