Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features
Sharath Adavanne, Giambattista Parascandolo, Pasi Pertil\"a, Toni, Heittola, Tuomas Virtanen

TL;DR
This paper introduces a multichannel audio-based sound event detection system that leverages spatial and harmonic features with LSTM networks, outperforming mono channel methods in overlapping sound scenarios.
Contribution
It proposes a novel approach combining spatial and harmonic features with LSTM for multichannel sound event detection, enhancing recognition of overlapping sounds.
Findings
Spatial and harmonic features improve detection accuracy.
Multichannel approach outperforms mono channel methods.
System shows robustness in overlapping sound scenarios.
Abstract
In this paper, we propose the use of spatial and harmonic features in combination with long short term memory (LSTM) recurrent neural network (RNN) for automatic sound event detection (SED) task. Real life sound recordings typically have many overlapping sound events, making it hard to recognize with just mono channel audio. Human listeners have been successfully recognizing the mixture of overlapping sound events using pitch cues and exploiting the stereo (multichannel) audio signal available at their ears to spatially localize these events. Traditionally SED systems have only been using mono channel audio, motivated by the human listener we propose to extend them to use multichannel audio. The proposed SED system is compared against the state of the art mono channel method on the development subset of TUT sound events detection 2016 database. The usage of spatial and harmonic features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
