Crowdsourcing strong labels for sound event detection

Irene Mart\'in-Morat\'o; Manu Harju; Annamaria Mesaros

arXiv:2107.12089·eess.AS·July 27, 2021·WASPAA

Crowdsourcing strong labels for sound event detection

Irene Mart\'in-Morat\'o, Manu Harju, Annamaria Mesaros

PDF

TL;DR

This paper introduces a crowdsourcing method to generate accurate strong labels for sound event detection from weak labels, reducing annotation effort while maintaining high precision in the labels.

Contribution

A novel approach that estimates strong labels from crowdsourced weak labels by assessing annotator competence and aggregating results, verified with synthetic audio data.

Findings

01

Achieves 80% F-score in 1 s segments

02

Up to 89.5% intersection-based F1-score

03

High precision in generated labels

Abstract

Strong labels are a necessity for evaluation of sound event detection methods, but often scarcely available due to the high resources required by the annotation task. We present a method for estimating strong labels using crowdsourced weak labels, through a process that divides the annotation task into simple unit tasks. Based on estimations of annotators' competence, aggregation and processing of the weak labels results in a set of objective strong labels. The experiment uses synthetic audio in order to verify the quality of the resulting annotations through comparison with ground truth. The proposed method produces labels with high precision, though not all event instances are recalled. Detection metrics comparing the produced annotations with the ground truth show 80% F-score in 1 s segments, and up to 89.5% intersection-based F1-score calculated according to the polyphonic sound…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.