Sound Event Detection and Separation: a Benchmark on Desed Synthetic Soundscapes
Nicolas Turpault (MULTISPEECH), Romain Serizel (MULTISPEECH), Scott, Wisdom, Hakan Erdogan, John Hershey, Eduardo Fonseca, Prem Seetharaman,, Justin Salamon

TL;DR
This paper introduces a benchmark for sound event detection systems using synthetic soundscapes, analyzing their performance under various challenging conditions like reverberation and non-target sounds.
Contribution
It provides a synthetic benchmark dataset and evaluates state-of-the-art SED systems, highlighting challenges and potential solutions like sound separation.
Findings
Localization remains a challenge for SED systems
Reverberation degrades detection performance
Non-target sounds significantly impact accuracy
Abstract
We propose a benchmark of state-of-the-art sound event detection systems (SED). We designed synthetic evaluation sets to focus on specific sound event detection challenges. We analyze the performance of the submissions to DCASE 2021 task 4 depending on time related modifications (time position of an event and length of clips) and we study the impact of non-target sound events and reverberation. We show that the localization in time of sound events is still a problem for SED systems. We also show that reverberation and non-target sound events are severely degrading the performance of the SED systems. In the latter case, sound separation seems like a promising solution.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
