Data-Efficient Self-Supervised Algorithms for Fine-Grained Birdsong Analysis

Houtan Ghaffari; Lukas Rauch; Paul Devos

arXiv:2511.12158·cs.LG·May 20, 2026

Data-Efficient Self-Supervised Algorithms for Fine-Grained Birdsong Analysis

Houtan Ghaffari, Lukas Rauch, Paul Devos

PDF

TL;DR

This paper introduces a data-efficient, three-stage self-supervised learning pipeline for birdsong syllable detection, significantly reducing annotation costs and demonstrating effectiveness on Canary and Bengalese Finch songs.

Contribution

It proposes a novel residual MLP-RNN model and a three-stage training pipeline combining self-supervised, supervised, and semi-supervised learning for birdsong analysis.

Findings

01

Effective in extreme label-scarcity scenarios for Canary song

02

Generalizes well to Bengalese Finch song

03

Self-supervised embeddings enable unsupervised analysis

Abstract

Research in bioacoustics, neuroscience, and linguistics often uses birdsong as a proxy to acquire knowledge across diverse areas. This requires audio models to annotate and parse the birdsong. Developing such models requires precise, syllable-level annotated training data. Therefore, automated methods that reduce annotation costs are in demand. This work presents a data-efficient birdsong annotator called Residual Multi-Layer Perceptron Recurrent Neural Network. It then presents a three-stage training pipeline for developing reliable birdsong syllable detectors with minimal annotation. The first stage is self-supervised learning from unlabeled data. Two of the most successful pretraining paradigms are explored, namely, masked prediction and online clustering. The second stage is supervised training with effective data augmentation to produce a robust frame-level syllable detector for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnimal Vocal Communication and Behavior · Music and Audio Processing · Speech and Audio Processing