A Dataset of Dynamic Reverberant Sound Scenes with Directional   Interferers for Sound Event Localization and Detection

Archontis Politis; Sharath Adavanne; Daniel Krause; Antoine Deleforge,; Prerak Srivastava; and Tuomas Virtanen

arXiv:2106.06999·eess.AS·July 6, 2021·20 cites

A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection

Archontis Politis, Sharath Adavanne, Daniel Krause, Antoine Deleforge,, Prerak Srivastava, and Tuomas Virtanen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new challenging dataset for sound event localization and detection that includes directional interferers, ambient noise, and reverberation, along with a baseline model that outperforms previous approaches.

Contribution

The paper presents a novel dataset with directional interferers for SELD, and provides a baseline model demonstrating improved performance and increased difficulty for the task.

Findings

01

Directional interferers significantly degrade system performance.

02

The baseline model with ACCDOA representation outperforms previous models.

03

The dataset is more challenging due to polyphony and overlapping instances.

Abstract

This report presents the dataset and baseline of Task 3 of the DCASE2021 Challenge on Sound Event Localization and Detection (SELD). The dataset is based on emulation of real recordings of static or moving sound events under real conditions of reverberation and ambient noise, using spatial room impulse responses captured in a variety of rooms and delivered in two spatial formats. The acoustical synthesis remains the same as in the previous iteration of the challenge, however the new dataset brings more challenging conditions of polyphony and overlapping instances of the same class. The most important difference of the new dataset is the introduction of directional interferers, meaning sound events that are localized in space but do not belong to the target classes to be detected and are not annotated. Since such interfering events are expected in every real-world scenario of SELD, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sharathadavanne/seld-dcase2021
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis