Self-supervised Learning for Acoustic Few-Shot Classification

Jingyong Liang; Bernd Meyer; Isaac Ning Lee; Thanh-Toan Do

arXiv:2409.09647·cs.SD·May 16, 2025

Self-supervised Learning for Acoustic Few-Shot Classification

Jingyong Liang, Bernd Meyer, Isaac Ning Lee, Thanh-Toan Do

PDF

Open Access

TL;DR

This paper introduces a novel self-supervised learning architecture combining CNNs and state space models for acoustic few-shot classification, demonstrating superior performance on benchmarks and real-world bioacoustic data.

Contribution

It proposes a new CNN and state space model hybrid architecture trained with contrastive learning for improved acoustic few-shot classification.

Findings

01

Outperforms existing architectures on standard benchmarks.

02

Effective with very limited labeled data.

03

Captures long-range temporal dependencies effectively.

Abstract

Labelled data are limited and self-supervised learning is one of the most important approaches for reducing labelling requirements. While it has been extensively explored in the image domain, it has so far not received the same amount of attention in the acoustic domain. Yet, reducing labelling is a key requirement for many acoustic applications. Specifically in bioacoustic, there are rarely sufficient labels for fully supervised learning available. This has led to the widespread use of acoustic recognisers that have been pre-trained on unrelated data for bioacoustic tasks. We posit that training on the actual task data and combining self-supervised pre-training with few-shot classification is a superior approach that has the ability to deliver high accuracy even when only a few labels are available. To this end, we introduce and evaluate a new architecture that combines CNN-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Contrastive Learning