An Experimental Study on Joint Modeling for Sound Event Localization and   Detection with Source Distance Estimation

Yuxuan Dong; Qing Wang; Hengyi Hong; Ya Jiang; Shi Cheng

arXiv:2501.10755·cs.SD·January 22, 2025

An Experimental Study on Joint Modeling for Sound Event Localization and Detection with Source Distance Estimation

Yuxuan Dong, Qing Wang, Hengyi Hong, Ya Jiang, Shi Cheng

PDF

Open Access

TL;DR

This paper introduces three innovative joint modeling approaches for 3D sound event localization and detection, including source distance estimation, significantly advancing spatial sound analysis.

Contribution

It presents novel methods for integrating source distance estimation with localization and detection, achieving state-of-the-art results in 3D SELD tasks.

Findings

01

Ranked first in the DCASE 2024 Challenge Task 3

02

Demonstrated effectiveness of joint modeling approaches

03

Proposed methods outperform traditional separate models

Abstract

In traditional sound event localization and detection (SELD) tasks, the focus is typically on sound event detection (SED) and direction-of-arrival (DOA) estimation, but they fall short of providing full spatial information about the sound source. The 3D SELD task addresses this limitation by integrating source distance estimation (SDE), allowing for complete spatial localization. We propose three approaches to tackle this challenge: a novel method with independent training and joint prediction, which firstly treats DOA and distance estimation as separate tasks and then combines them to solve 3D SELD; a dual-branch representation with source Cartesian coordinate used for simultaneous DOA and distance estimation; and a three-branch structure that jointly models SED, DOA, and SDE within a unified framework. Our proposed method ranked first in the DCASE 2024 Challenge Task 3, demonstrating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis

MethodsFocus