LEAD Dataset: How Can Labels for Sound Event Detection Vary Depending on Annotators?
Naoki Koga, Yoshiaki Bando, and Keisuke Imoto

TL;DR
The LEAD dataset provides multi-annotator sound event labels to analyze how annotation variability affects sound event detection models and to develop more robust approaches.
Contribution
We introduce the LEAD dataset with annotations from 20 different annotators, enabling analysis of label variation and robustness in sound event detection.
Findings
Significant variation exists among annotators' labels.
Analysis reveals how label differences impact model training.
Insights into creating robust SED models considering annotation variability.
Abstract
In this paper, we introduce a LargE-scale Annotator's labels for sound event Detection (LEAD) dataset, which is the dataset used to gain a better understanding of the variation in strong labels in sound event detection (SED). In SED, it is very time-consuming to collect large-scale strong labels, and in most cases, multiple workers divide up the annotations to create a single dataset. In general, strong labels created by multiple annotators have large variations in the type of sound events and temporal onset/offset. Through the annotations of multiple workers, uniquely determining the strong label is quite difficult because the dataset contains sounds that can be mistaken for similar classes and sounds whose temporal onset/offset is difficult to distinguish. If the strong labels of SED vary greatly depending on the annotator, the SED model trained on a dataset created by multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech Recognition and Synthesis
