FMSG-JLESS Submission for DCASE 2024 Task4 on Sound Event Detection with   Heterogeneous Training Dataset and Potentially Missing Labels

Yang Xiao; Han Yin; Jisheng Bai; Rohan Kumar Das

arXiv:2407.00291·eess.AS·July 2, 2024

FMSG-JLESS Submission for DCASE 2024 Task4 on Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels

Yang Xiao, Han Yin, Jisheng Bai, Rohan Kumar Das

PDF

Open Access

TL;DR

This paper introduces a sound event detection system for DCASE 2024 that handles heterogeneous datasets and missing labels using domain generalization, feature adaptation, and dataset-specific training strategies.

Contribution

It proposes a novel domain generalization approach combining audio transformers and CNNs with mixstyle and dataset-specific loss functions for improved sound event detection.

Findings

01

Achieved superior macro-average pAUC on validation data.

02

Improved polyphonic SED score over baseline methods.

03

Effective handling of heterogeneous datasets with missing labels.

Abstract

This report presents the systems developed and submitted by Fortemedia Singapore (FMSG) and Joint Laboratory of Environmental Sound Sensing (JLESS) for DCASE 2024 Task 4. The task focuses on recognizing event classes and their time boundaries, given that multiple events can be present and may overlap in an audio recording. The novelty this year is a dataset with two sources, making it challenging to achieve good performance without knowing the source of the audio clips during evaluation. To address this, we propose a sound event detection method using domain generalization. Our approach integrates features from bidirectional encoder representations from audio transformers and a convolutional recurrent neural network. We focus on three main strategies to improve our method. First, we apply mixstyle to the frequency dimension to adapt the mel-spectrograms from different domains. Second,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis

MethodsFocus