Sound Event Localization and Detection of Overlapping Sources Using   Convolutional Recurrent Neural Networks

Sharath Adavanne; Archontis Politis; Joonas Nikunen; Tuomas Virtanen

arXiv:1807.00129·cs.SD·December 18, 2018

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

Sharath Adavanne, Archontis Politis, Joonas Nikunen, Tuomas Virtanen

PDF

5 Repos

TL;DR

This paper introduces a convolutional recurrent neural network that jointly localizes and detects overlapping sound events in 3D space, demonstrating robustness across various array formats and real-world conditions.

Contribution

The paper presents a novel CNN-RNN model that simultaneously performs sound event detection and 3D localization without array-specific feature extraction, applicable to diverse array geometries.

Findings

01

Higher recall of estimated DOAs compared to baselines.

02

Robust performance in reverberant and low SNR environments.

03

Effective in scenarios with multiple overlapping sound events.

Abstract

In this paper, we propose a convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3D) space. The proposed network takes a sequence of consecutive spectrogram time-frames as input and maps it to two outputs in parallel. As the first output, the sound event detection (SED) is performed as a multi-label classification task on each time-frame producing temporal activity for all the sound event classes. As the second output, localization is performed by estimating the 3D Cartesian coordinates of the direction-of-arrival (DOA) for each sound event class using multi-output regression. The proposed method is able to associate multiple DOAs with respective sound event labels and further track this association with respect to time. The proposed method uses separately the phase and magnitude…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.