Multichannel Sound Event Detection Using 3D Convolutional Neural   Networks for Learning Inter-channel Features

Sharath Adavanne; Archontis Politis; Tuomas Virtanen

arXiv:1801.09522·cs.SD·January 30, 2018

Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features

Sharath Adavanne, Archontis Politis, Tuomas Virtanen

PDF

TL;DR

This paper introduces a 3D CNN-based multichannel sound event detection method that effectively learns inter- and intra-channel features, improving recognition accuracy and training efficiency over single-channel approaches.

Contribution

The paper presents a novel 3D CNN integrated into a CRNN for multichannel SED, demonstrating enhanced performance with fewer training epochs compared to traditional methods.

Findings

01

Improved F-score by 7.5% with multichannel Ambisonic audio

02

Reduced error rate by 10% using the proposed method

03

Recognized 15.6% more overlapping sound events

Abstract

In this paper, we propose a stacked convolutional and recurrent neural network (CRNN) with a 3D convolutional neural network (CNN) in the first layer for the multichannel sound event detection (SED) task. The 3D CNN enables the network to simultaneously learn the inter- and intra-channel features from the input multichannel audio. In order to evaluate the proposed method, multichannel audio datasets with different number of overlapping sound sources are synthesized. Each of this dataset has a four-channel first-order Ambisonic, binaural, and single-channel versions, on which the performance of SED using the proposed method are compared to study the potential of SED using multichannel audio. A similar study is also done with the binaural and single-channel versions of the real-life recording TUT-SED 2017 development dataset. The proposed method learns to recognize overlapping sound…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.