AD-YOLO: You Look Only Once in Training Multiple Sound Event   Localization and Detection

Jin Sob Kim; Hyun Joon Park; Wooseok Shin; and Sung Won Han

arXiv:2303.15703·eess.AS·May 11, 2023·ICASSP·1 cites

AD-YOLO: You Look Only Once in Training Multiple Sound Event Localization and Detection

Jin Sob Kim, Hyun Joon Park, Wooseok Shin, and Sung Won Han

PDF

Open Access 1 Repo

TL;DR

This paper introduces AD-YOLO, a novel sound event localization and detection model that effectively handles polyphony and unknown environments by adapting the YOLO framework with an angular-distance-based format.

Contribution

The paper presents AD-YOLO, a new SELD approach that improves generalization and polyphony handling through a location-sensitive format inspired by YOLO.

Findings

01

Achieved outstanding performance on DCASE datasets

02

Demonstrated robustness in class-homogeneous polyphony environments

03

Outperformed existing SELD methods in multiple metrics

Abstract

Sound event localization and detection (SELD) combines the identification of sound events with the corresponding directions of arrival (DOA). Recently, event-oriented track output formats have been adopted to solve this problem; however, they still have limited generalization toward real-world problems in an unknown polyphony environment. To address the issue, we proposed an angular-distance-based multiple SELD (AD-YOLO), which is an adaptation of the "You Only Look Once" algorithm for SELD. The AD-YOLO format allows the model to learn sound occurrences location-sensitively by assigning class responsibility to DOA predictions. Hence, the format enables the model to handle the polyphony problem, regardless of the number of sound overlaps. We evaluated AD-YOLO on DCASE 2020-2022 challenge Task 3 datasets using four SELD objective metrics. The experimental results show that AD-YOLO…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sadPororo/AD-YOLO
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Underwater Acoustics Research