MFF-EINV2: Multi-scale Feature Fusion across Spectral-Spatial-Temporal Domains for Sound Event Localization and Detection
Da Mu, Zhicheng Zhang, Haobo Yue

TL;DR
This paper introduces MFF-EINV2, a novel multi-scale feature fusion network that enhances sound event localization and detection by effectively integrating spectral, spatial, and temporal features, achieving state-of-the-art results.
Contribution
It proposes a three-stage multi-scale feature fusion module integrated into EINV2 to improve feature extraction across domains for SELD.
Findings
Achieves state-of-the-art performance on DCASE datasets
Effectively extracts multi-scale spectral, spatial, and temporal features
Outperforms previous methods in SELD tasks
Abstract
Sound Event Localization and Detection (SELD) involves detecting and localizing sound events using multichannel sound recordings. Previously proposed Event-Independent Network V2 (EINV2) has achieved outstanding performance on SELD. However, it still faces challenges in effectively extracting features across spectral, spatial, and temporal domains. This paper proposes a three-stage network structure named Multi-scale Feature Fusion (MFF) module to fully extract multi-scale features across spectral, spatial, and temporal domains. The MFF module utilizes parallel subnetworks architecture to generate multi-scale spectral and spatial features. The TF-Convolution Module is employed to provide multi-scale temporal features. We incorporated MFF into EINV2 and term the proposed method as MFF-EINV2. Experimental results in 2022 and 2023 DCASE challenge task3 datasets show the effectiveness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Diverse Musicological Studies
MethodsMultimodal Fuzzy Fusion Framework
