Sound event detection based on auxiliary decoder and maximum probability   aggregation for DCASE Challenge 2024 Task 4

Sang Won Son; Jongyeon Park; Hong Kook Kim; Sulaiman Vesal; Jeong Eun; Lim

arXiv:2406.12721·eess.AS·June 25, 2024

Sound event detection based on auxiliary decoder and maximum probability aggregation for DCASE Challenge 2024 Task 4

Sang Won Son, Jongyeon Park, Hong Kook Kim, Sulaiman Vesal, Jeong Eun, Lim

PDF

Open Access

TL;DR

This paper introduces three innovative methods for sound event detection in the DCASE 2024 Challenge, including an auxiliary decoder, maximum probability aggregation, and multi-channel features, to improve robustness and dataset adaptability.

Contribution

The paper presents novel auxiliary decoder, MPA technique, and multi-channel features to enhance SED model performance and dataset compatibility.

Findings

01

Improved SED performance across datasets.

02

Enhanced feature extraction with auxiliary decoder.

03

Effective alignment of soft labels using MPA.

Abstract

In this report, we propose three novel methods for developing a sound event detection (SED) model for the DCASE 2024 Challenge Task 4. First, we propose an auxiliary decoder attached to the final convolutional block to improve feature extraction capabilities while reducing dependency on embeddings from pre-trained large models. The proposed auxiliary decoder operates independently from the main decoder, enhancing performance of the convolutional block during the initial training stages by assigning a different weight strategy between main and auxiliary decoder losses. Next, to address the time interval issue between the DESED and MAESTRO datasets, we propose maximum probability aggregation (MPA) during the training step. The proposed MPA method enables the model's output to be aligned with soft labels of 1 s in the MAESTRO dataset. Finally, we propose a multi-channel input feature that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis