Multi-dimensional frequency dynamic convolution with confident mean teacher for sound event detection
Shengchang Xiao, Xueshuai Zhang, Pengyuan Zhang

TL;DR
This paper introduces multi-dimensional frequency dynamic convolution (MFDConv) with a novel attention mechanism to improve sound event detection by better capturing time-frequency features, and proposes a confident mean teacher to enhance pseudo-label accuracy.
Contribution
It presents a new multi-dimensional frequency dynamic convolution and a confident mean teacher framework, advancing feature extraction and pseudo-label reliability in sound event detection.
Findings
Achieved PSDS1 of 0.470 on DESED dataset
Achieved PSDS2 of 0.692 on DESED dataset
Enhanced feature extraction with MFDConv and confidence in pseudo-labels
Abstract
Recently, convolutional neural networks (CNNs) have been widely used in sound event detection (SED). However, traditional convolution is deficient in learning time-frequency domain representation of different sound events. To address this issue, we propose multi-dimensional frequency dynamic convolution (MFDConv), a new design that endows convolutional kernels with frequency-adaptive dynamic properties along multiple dimensions. MFDConv utilizes a novel multi-dimensional attention mechanism with a parallel strategy to learn complementary frequency-adaptive attentions, which substantially strengthen the feature extraction ability of convolutional kernels. Moreover, in order to promote the performance of mean teacher, we propose the confident mean teacher to increase the accuracy of pseudo-labels from the teacher and train the student with high confidence labels. Experimental results show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies · Speech and Audio Processing
MethodsConvolution
