Channel-Spatial-Based Few-Shot Bird Sound Event Detection
Lingwen Liu, Yuxuan Feng, Haitao Fu, Yajie Yang, Xin Pan, Chenlei, Jin

TL;DR
This paper introduces a novel few-shot learning model for bird sound event detection that leverages channel and spatial attention mechanisms to improve detection accuracy with limited training data.
Contribution
It proposes the Metric Channel-Spatial Network with a Channel Spatial Squeeze-Excitation block, enhancing feature learning for few-shot bird sound detection.
Findings
Achieved an F-measure of 66.84% on DCASE 2022 Take5 dataset
Attained a PSDS of 58.98% on the same dataset
Demonstrated the effectiveness of attention mechanisms in few-shot bird sound detection
Abstract
In this paper, we propose a model for bird sound event detection that focuses on a small number of training samples within the everyday long-tail distribution. As a result, we investigate bird sound detection using the few-shot learning paradigm. By integrating channel and spatial attention mechanisms, improved feature representations can be learned from few-shot training datasets. We develop a Metric Channel-Spatial Network model by incorporating a Channel Spatial Squeeze-Excitation block into the prototype network, combining it with these attention mechanisms. We evaluate the Metric Channel Spatial Network model on the DCASE 2022 Take5 dataset benchmark, achieving an F-measure of 66.84% and a PSDS of 58.98%. Our experiment demonstrates that the combination of channel and spatial attention mechanisms effectively enhances the performance of bird sound classification and detection.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnimal Vocal Communication and Behavior · Marine animal studies overview · Music and Audio Processing
