DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and   Missing Labels

Samuele Cornell; Janek Ebbers; Constance Douwes; Irene; Mart\'in-Morat\'o; Manu Harju; Annamaria Mesaros; Romain Serizel

arXiv:2406.08056·eess.AS·June 13, 2024·1 cites

DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and Missing Labels

Samuele Cornell, Janek Ebbers, Constance Douwes, Irene, Mart\'in-Morat\'o, Manu Harju, Annamaria Mesaros, Romain Serizel

PDF

Open Access

TL;DR

This paper discusses the DCASE 2024 Task 4 challenge focused on developing sound event detection systems that can handle heterogeneous data with varying annotation quality and missing labels, aiming for robust performance across diverse environments.

Contribution

It introduces a new challenge setup and an updated baseline system to address training with diverse, incomplete, and inconsistent annotations in sound event detection.

Findings

01

Using diverse domain data improves SED performance over single-domain training.

02

The baseline system demonstrates robustness despite missing labels and annotation inconsistencies.

03

Research indicates potential for more generalized SED systems with heterogeneous data.

Abstract

The Detection and Classification of Acoustic Scenes and Events Challenge Task 4 aims to advance sound event detection (SED) systems in domestic environments by leveraging training data with different supervision uncertainty. Participants are challenged in exploring how to best use training data from different domains and with varying annotation granularity (strong/weak temporal resolution, soft/hard labels), to obtain a robust SED system that can generalize across different scenarios. Crucially, annotation across available training datasets can be inconsistent and hence sound labels of one dataset may be present but not annotated in the other one and vice-versa. As such, systems will have to cope with potentially missing target labels during training. Moreover, as an additional novelty, systems will also be evaluated on labels with different granularity in order to assess their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing