Frequency Dependent Sound Event Detection for DCASE 2022 Challenge Task   4

Hyeonuk Nam; Seong-Hu Kim; Deokki Min; Byeong-Yun Ko; Seung-Deok Choi,; Yong-Hwa Park

arXiv:2206.11645·eess.AS·June 24, 2022·1 cites

Frequency Dependent Sound Event Detection for DCASE 2022 Challenge Task 4

Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Byeong-Yun Ko, Seung-Deok Choi,, Yong-Hwa Park

PDF

Open Access 1 Repo

TL;DR

This paper introduces frequency-dependent methods, FilterAugment and frequency dynamic convolution, to improve sound event detection performance by leveraging the frequency dimension of audio data, achieving top results in the DCASE 2022 Challenge Task 4.

Contribution

It proposes novel frequency-dependent techniques specifically designed for SED, addressing the gap in applying cross-domain methods to audio data.

Findings

01

Achieved best PSDS1 score of 0.4704

02

Achieved best PSDS2 score of 0.8224

03

Demonstrated the effectiveness of frequency-dependent methods in SED

Abstract

While many deep learning methods on other domains have been applied to sound event detection (SED), differences between original domains of the methods and SED have not been appropriately considered so far. As SED uses audio data with two dimensions (time and frequency) for input, thorough comprehension on these two dimensions is essential for application of methods from other domains on SED. Previous works proved that methods those address on frequency dimension are especially powerful in SED. By applying FilterAugment and frequency dynamic convolution those are frequency dependent methods proposed to enhance SED performance, our submitted models achieved best PSDS1 of 0.4704 and best PSDS2 of 0.8224.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

frednam93/FDY-SED
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis

MethodsConvolution