Multi-dimensional frequency dynamic convolution with confident mean   teacher for sound event detection

Shengchang Xiao; Xueshuai Zhang; Pengyuan Zhang

arXiv:2302.09256·eess.AS·February 22, 2023·1 cites

Multi-dimensional frequency dynamic convolution with confident mean teacher for sound event detection

Shengchang Xiao, Xueshuai Zhang, Pengyuan Zhang

PDF

Open Access

TL;DR

This paper introduces multi-dimensional frequency dynamic convolution (MFDConv) with a novel attention mechanism to improve sound event detection by better capturing time-frequency features, and proposes a confident mean teacher to enhance pseudo-label accuracy.

Contribution

It presents a new multi-dimensional frequency dynamic convolution and a confident mean teacher framework, advancing feature extraction and pseudo-label reliability in sound event detection.

Findings

01

Achieved PSDS1 of 0.470 on DESED dataset

02

Achieved PSDS2 of 0.692 on DESED dataset

03

Enhanced feature extraction with MFDConv and confidence in pseudo-labels

Abstract

Recently, convolutional neural networks (CNNs) have been widely used in sound event detection (SED). However, traditional convolution is deficient in learning time-frequency domain representation of different sound events. To address this issue, we propose multi-dimensional frequency dynamic convolution (MFDConv), a new design that endows convolutional kernels with frequency-adaptive dynamic properties along multiple dimensions. MFDConv utilizes a novel multi-dimensional attention mechanism with a parallel strategy to learn complementary frequency-adaptive attentions, which substantially strengthen the feature extraction ability of convolutional kernels. Moreover, in order to promote the performance of mean teacher, we propose the confident mean teacher to increase the accuracy of pseudo-labels from the teacher and train the student with high confidence labels. Experimental results show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Diverse Musicological Studies · Speech and Audio Processing

MethodsConvolution