Frequency Dependent Sound Event Detection for DCASE 2022 Challenge Task 4
Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Byeong-Yun Ko, Seung-Deok Choi,, Yong-Hwa Park

TL;DR
This paper introduces frequency-dependent methods, FilterAugment and frequency dynamic convolution, to improve sound event detection performance by leveraging the frequency dimension of audio data, achieving top results in the DCASE 2022 Challenge Task 4.
Contribution
It proposes novel frequency-dependent techniques specifically designed for SED, addressing the gap in applying cross-domain methods to audio data.
Findings
Achieved best PSDS1 score of 0.4704
Achieved best PSDS2 score of 0.8224
Demonstrated the effectiveness of frequency-dependent methods in SED
Abstract
While many deep learning methods on other domains have been applied to sound event detection (SED), differences between original domains of the methods and SED have not been appropriately considered so far. As SED uses audio data with two dimensions (time and frequency) for input, thorough comprehension on these two dimensions is essential for application of methods from other domains on SED. Previous works proved that methods those address on frequency dimension are especially powerful in SED. By applying FilterAugment and frequency dynamic convolution those are frequency dependent methods proposed to enhance SED performance, our submitted models achieved best PSDS1 of 0.4704 and best PSDS2 of 0.8224.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
MethodsConvolution
