Self-supervised Learning for Acoustic Few-Shot Classification
Jingyong Liang, Bernd Meyer, Isaac Ning Lee, Thanh-Toan Do

TL;DR
This paper introduces a novel self-supervised learning architecture combining CNNs and state space models for acoustic few-shot classification, demonstrating superior performance on benchmarks and real-world bioacoustic data.
Contribution
It proposes a new CNN and state space model hybrid architecture trained with contrastive learning for improved acoustic few-shot classification.
Findings
Outperforms existing architectures on standard benchmarks.
Effective with very limited labeled data.
Captures long-range temporal dependencies effectively.
Abstract
Labelled data are limited and self-supervised learning is one of the most important approaches for reducing labelling requirements. While it has been extensively explored in the image domain, it has so far not received the same amount of attention in the acoustic domain. Yet, reducing labelling is a key requirement for many acoustic applications. Specifically in bioacoustic, there are rarely sufficient labels for fully supervised learning available. This has led to the widespread use of acoustic recognisers that have been pre-trained on unrelated data for bioacoustic tasks. We posit that training on the actual task data and combining self-supervised pre-training with few-shot classification is a superior approach that has the ability to deliver high accuracy even when only a few labels are available. To this end, we introduce and evaluate a new architecture that combines CNN-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Contrastive Learning
