TASK3 DCASE2021 Challenge: Sound event localization and detection using   squeeze-excitation residual CNNs

Javier Naranjo-Alcazar; Sergi Perez-Castanos; Pedro Zuccarello,; Francesc J. Ferri; Maximo Cobos

arXiv:2107.14561·cs.SD·August 2, 2021

TASK3 DCASE2021 Challenge: Sound event localization and detection using squeeze-excitation residual CNNs

Javier Naranjo-Alcazar, Sergi Perez-Castanos, Pedro Zuccarello,, Francesc J. Ferri, Maximo Cobos

PDF

Open Access

TL;DR

This paper investigates the impact of squeeze-excitation residual CNNs on sound event localization and detection, demonstrating performance improvements across datasets in the DCASE2021 challenge.

Contribution

It introduces the application of squeeze-excitation techniques within residual CNNs for SELD, extending previous work and evaluating their effectiveness on multiple datasets.

Findings

01

Performance improvement over baseline on MIC dataset

02

Enhanced system accuracy with squeeze-excitation residual CNNs

03

Study confirms dataset-specific benefits of the technique

Abstract

Sound event localisation and detection (SELD) is a problem in the field of automatic listening that aims at the temporal detection and localisation (direction of arrival estimation) of sound events within an audio clip, usually of long duration. Due to the amount of data present in the datasets related to this problem, solutions based on deep learning have positioned themselves at the top of the state of the art. Most solutions are based on 2D representations of the audio (different spectrograms) that are processed by a convolutional-recurrent network. The motivation of this submission is to study the squeeze-excitation technique in the convolutional part of the network and how it improves the performance of the system. This study is based on the one carried out by the same team last year. This year, it has been decided to study how this technique improves each of the datasets (last…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis