Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features
Sharath Adavanne, Archontis Politis, Tuomas Virtanen

TL;DR
This paper introduces a 3D CNN-based multichannel sound event detection method that effectively learns inter- and intra-channel features, improving recognition accuracy and training efficiency over single-channel approaches.
Contribution
The paper presents a novel 3D CNN integrated into a CRNN for multichannel SED, demonstrating enhanced performance with fewer training epochs compared to traditional methods.
Findings
Improved F-score by 7.5% with multichannel Ambisonic audio
Reduced error rate by 10% using the proposed method
Recognized 15.6% more overlapping sound events
Abstract
In this paper, we propose a stacked convolutional and recurrent neural network (CRNN) with a 3D convolutional neural network (CNN) in the first layer for the multichannel sound event detection (SED) task. The 3D CNN enables the network to simultaneously learn the inter- and intra-channel features from the input multichannel audio. In order to evaluate the proposed method, multichannel audio datasets with different number of overlapping sound sources are synthesized. Each of this dataset has a four-channel first-order Ambisonic, binaural, and single-channel versions, on which the performance of SED using the proposed method are compared to study the potential of SED using multichannel audio. A similar study is also done with the binaural and single-channel versions of the real-life recording TUT-SED 2017 development dataset. The proposed method learns to recognize overlapping sound…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
