Dynamic Kernel Convolution Network with Scene-dedicate Training for Sound Event Localization and Detection
Siwei Huang, Jianfeng Chen, Jisheng Bai, Yafei Jia, Dongzhe Zhang

TL;DR
This paper introduces a novel sound event localization and detection system that uses dynamic kernel convolution and scene-specific training strategies to improve performance in realistic spatial sound environments, outperforming existing methods.
Contribution
The paper presents a dynamic kernel convolution module and scene-dedicated training strategies integrated into a SELD system, enhancing adaptability and generalization in complex real-world sound scenes.
Findings
Outperforms fixed-kernel convolution SELD systems
Achieved an SELD score of 0.348 on the DCASE dataset
Surpassed state-of-the-art methods in the task
Abstract
DNN-based methods have shown high performance in sound event localization and detection(SELD). While in real spatial sound scenes, reverberation and the imbalanced presence of various sound events increase the complexity of the SELD task. In this paper, we propose an effective SELD system in real spatial scenes.In our approach, a dynamic kernel convolution module is introduced after the convolution blocks to adaptively model the channel-wise features with different receptive fields. Secondly, we incorporate the SELDnet and EINv2 framework into the proposed SELD system with multi-track ACCDOA. Moreover, two scene-dedicated strategies are introduced into the training stage to improve the generalization of the system in realistic spatial sound scenes. Finally, we apply data augmentation methods to extend the dataset using channel rotation, spatial data synthesis. Four joint metrics are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Underwater Acoustics Research
MethodsConvolution
