Diversifying and Expanding Frequency-Adaptive Convolution Kernels for   Sound Event Detection

Hyeonuk Nam; Seong-Hu Kim; Deokki Min; Junhyeok Lee; Yong-Hwa Park

arXiv:2406.05341·eess.AS·June 11, 2024

Diversifying and Expanding Frequency-Adaptive Convolution Kernels for Sound Event Detection

Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Junhyeok Lee, Yong-Hwa Park

PDF

Open Access 1 Repo

TL;DR

This paper introduces dilated frequency dynamic convolution (DFD conv) for sound event detection, which diversifies and expands frequency-adaptive kernels using dilation, leading to improved detection performance over previous methods.

Contribution

The paper proposes DFD conv with dilation to diversify and expand frequency-adaptive kernels, enhancing sound event detection accuracy.

Findings

01

DFD conv outperforms FDY conv by 3.12% in PSDS.

02

Varying dilation sizes improves kernel diversity.

03

Dilated basis kernels are effectively diversified as shown by attention weight analysis.

Abstract

Frequency dynamic convolution (FDY conv) has shown the state-of-the-art performance in sound event detection (SED) using frequency-adaptive kernels obtained by frequency-varying combination of basis kernels. However, FDY conv lacks an explicit mean to diversify frequency-adaptive kernels, potentially limiting the performance. In addition, size of basis kernels is limited while time-frequency patterns span larger spectro-temporal range. Therefore, we propose dilated frequency dynamic convolution (DFD conv) which diversifies and expands frequency-adaptive kernels by introducing different dilation sizes to basis kernels. Experiments showed advantages of varying dilation sizes along frequency dimension, and analysis on attention weight variance proved dilated basis kernels are effectively diversified. By adapting class-wise median filter with intersection-based F1 score, proposed DFD-CRNN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

frednam93/MDFD-SED
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis

MethodsConvolution