AD-YOLO: You Look Only Once in Training Multiple Sound Event Localization and Detection
Jin Sob Kim, Hyun Joon Park, Wooseok Shin, and Sung Won Han

TL;DR
This paper introduces AD-YOLO, a novel sound event localization and detection model that effectively handles polyphony and unknown environments by adapting the YOLO framework with an angular-distance-based format.
Contribution
The paper presents AD-YOLO, a new SELD approach that improves generalization and polyphony handling through a location-sensitive format inspired by YOLO.
Findings
Achieved outstanding performance on DCASE datasets
Demonstrated robustness in class-homogeneous polyphony environments
Outperformed existing SELD methods in multiple metrics
Abstract
Sound event localization and detection (SELD) combines the identification of sound events with the corresponding directions of arrival (DOA). Recently, event-oriented track output formats have been adopted to solve this problem; however, they still have limited generalization toward real-world problems in an unknown polyphony environment. To address the issue, we proposed an angular-distance-based multiple SELD (AD-YOLO), which is an adaptation of the "You Only Look Once" algorithm for SELD. The AD-YOLO format allows the model to learn sound occurrences location-sensitively by assigning class responsibility to DOA predictions. Hence, the format enables the model to handle the polyphony problem, regardless of the number of sound overlaps. We evaluated AD-YOLO on DCASE 2020-2022 challenge Task 3 datasets using four SELD objective metrics. The experimental results show that AD-YOLO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Underwater Acoustics Research
