TASK3 DCASE2021 Challenge: Sound event localization and detection using squeeze-excitation residual CNNs
Javier Naranjo-Alcazar, Sergi Perez-Castanos, Pedro Zuccarello,, Francesc J. Ferri, Maximo Cobos

TL;DR
This paper investigates the impact of squeeze-excitation residual CNNs on sound event localization and detection, demonstrating performance improvements across datasets in the DCASE2021 challenge.
Contribution
It introduces the application of squeeze-excitation techniques within residual CNNs for SELD, extending previous work and evaluating their effectiveness on multiple datasets.
Findings
Performance improvement over baseline on MIC dataset
Enhanced system accuracy with squeeze-excitation residual CNNs
Study confirms dataset-specific benefits of the technique
Abstract
Sound event localisation and detection (SELD) is a problem in the field of automatic listening that aims at the temporal detection and localisation (direction of arrival estimation) of sound events within an audio clip, usually of long duration. Due to the amount of data present in the datasets related to this problem, solutions based on deep learning have positioned themselves at the top of the state of the art. Most solutions are based on 2D representations of the audio (different spectrograms) that are processed by a convolutional-recurrent network. The motivation of this submission is to study the squeeze-excitation technique in the convolutional part of the network and how it improves the performance of the system. This study is based on the one carried out by the same team last year. This year, it has been decided to study how this technique improves each of the datasets (last…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
