Two vs. Four-Channel Sound Event Localization and Detection
Julia Wilkins, Magdalena Fuentes, Luca Bondi, Shabnam Ghaffarzadegan,, Ali Abavisani, Juan Pablo Bello

TL;DR
This paper compares the performance of sound event localization and detection models using different audio input formats, revealing that 2-channel audio can still effectively localize sound sources despite reduced information.
Contribution
It provides a novel analysis of how binaural and stereo inputs impact SELD performance, especially in complex acoustic scenes, highlighting the viability of simpler audio formats.
Findings
Binaural and stereo inputs perform well in lateral localization.
Performance degrades with fewer audio channels.
Effect of scene complexity on localization accuracy.
Abstract
Sound event localization and detection (SELD) systems estimate both the direction-of-arrival (DOA) and class of sound sources over time. In the DCASE 2022 SELD Challenge (Task 3), models are designed to operate in a 4-channel setting. While beneficial to further the development of SELD systems using a multichannel recording setup such as first-order Ambisonics (FOA), most consumer electronics devices rarely are able to record using more than two channels. For this reason, in this work we investigate the performance of the DCASE 2022 SELD baseline model using three audio input representations: FOA, binaural, and stereo. We perform a novel comparative analysis illustrating the effect of these audio input representations on SELD performance. Crucially, we show that binaural and stereo (i.e. 2-channel) audio-based SELD models are still able to localize and detect sound sources laterally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Underwater Acoustics Research
