Sound Event Detection and Separation: a Benchmark on Desed Synthetic   Soundscapes

Nicolas Turpault (MULTISPEECH); Romain Serizel (MULTISPEECH); Scott; Wisdom; Hakan Erdogan; John Hershey; Eduardo Fonseca; Prem Seetharaman,; Justin Salamon

arXiv:2011.00801·cs.SD·November 3, 2020

Sound Event Detection and Separation: a Benchmark on Desed Synthetic Soundscapes

Nicolas Turpault (MULTISPEECH), Romain Serizel (MULTISPEECH), Scott, Wisdom, Hakan Erdogan, John Hershey, Eduardo Fonseca, Prem Seetharaman,, Justin Salamon

PDF

TL;DR

This paper introduces a benchmark for sound event detection systems using synthetic soundscapes, analyzing their performance under various challenging conditions like reverberation and non-target sounds.

Contribution

It provides a synthetic benchmark dataset and evaluates state-of-the-art SED systems, highlighting challenges and potential solutions like sound separation.

Findings

01

Localization remains a challenge for SED systems

02

Reverberation degrades detection performance

03

Non-target sounds significantly impact accuracy

Abstract

We propose a benchmark of state-of-the-art sound event detection systems (SED). We designed synthetic evaluation sets to focus on specific sound event detection challenges. We analyze the performance of the submissions to DCASE 2021 task 4 depending on time related modifications (time position of an event and length of clips) and we study the impact of non-target sound events and reverberation. We show that the localization in time of sound events is still a problem for SED systems. We also show that reverberation and non-target sound events are severely degrading the performance of the SED systems. In the latter case, sound separation seems like a promising solution.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.