Few-Shot Bioacoustic Event Detection with Frame-Level Embedding Learning System
PengYuan Zhao, ChengWei Lu, Liang Zou

TL;DR
This paper introduces a frame-level embedding learning system for few-shot bioacoustic event detection, utilizing advanced feature extraction, neural encoding, data augmentation, and post-processing, achieving high performance in a competitive challenge.
Contribution
The work presents a novel system combining log-mel, PCEN, Netmamba Encoder, and data augmentation for improved few-shot bioacoustic event detection.
Findings
Achieved 56.4% F-measure score
Secured 2nd place in DCASE2024 challenge
Demonstrated effectiveness of data augmentation and post-processing
Abstract
This technical report presents our frame-level embedding learning system for the DCASE2024 challenge for few-shot bioacoustic event detection (Task 5).In this work, we used log-mel and PCEN for feature extraction of the input audio, Netmamba Encoder as the information interaction network, and adopted data augmentation strategies to improve the generalizability of the trained model as well as multiple post-processing methods. Our final system achieved an F-measure score of 56.4%, securing the 2nd rank in the few-shot bioacoustic event detection category of the Detection and Classification of Acoustic Scenes and Events Challenge 2024.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Chemical Sensor Technologies
