Dynamic Kernel Convolution Network with Scene-dedicate Training for   Sound Event Localization and Detection

Siwei Huang; Jianfeng Chen; Jisheng Bai; Yafei Jia; Dongzhe Zhang

arXiv:2307.08239·eess.AS·July 18, 2023·1 cites

Dynamic Kernel Convolution Network with Scene-dedicate Training for Sound Event Localization and Detection

Siwei Huang, Jianfeng Chen, Jisheng Bai, Yafei Jia, Dongzhe Zhang

PDF

Open Access

TL;DR

This paper introduces a novel sound event localization and detection system that uses dynamic kernel convolution and scene-specific training strategies to improve performance in realistic spatial sound environments, outperforming existing methods.

Contribution

The paper presents a dynamic kernel convolution module and scene-dedicated training strategies integrated into a SELD system, enhancing adaptability and generalization in complex real-world sound scenes.

Findings

01

Outperforms fixed-kernel convolution SELD systems

02

Achieved an SELD score of 0.348 on the DCASE dataset

03

Surpassed state-of-the-art methods in the task

Abstract

DNN-based methods have shown high performance in sound event localization and detection(SELD). While in real spatial sound scenes, reverberation and the imbalanced presence of various sound events increase the complexity of the SELD task. In this paper, we propose an effective SELD system in real spatial scenes.In our approach, a dynamic kernel convolution module is introduced after the convolution blocks to adaptively model the channel-wise features with different receptive fields. Secondly, we incorporate the SELDnet and EINv2 framework into the proposed SELD system with multi-track ACCDOA. Moreover, two scene-dedicated strategies are introduced into the training stage to improve the generalization of the system in realistic spatial sound scenes. Finally, we apply data augmentation methods to extend the dataset using channel rotation, spatial data synthesis. Four joint metrics are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Underwater Acoustics Research

MethodsConvolution