Data-Efficient Self-Supervised Algorithms for Fine-Grained Birdsong Analysis
Houtan Ghaffari, Lukas Rauch, Paul Devos

TL;DR
This paper introduces a data-efficient, three-stage self-supervised learning pipeline for birdsong syllable detection, significantly reducing annotation costs and demonstrating effectiveness on Canary and Bengalese Finch songs.
Contribution
It proposes a novel residual MLP-RNN model and a three-stage training pipeline combining self-supervised, supervised, and semi-supervised learning for birdsong analysis.
Findings
Effective in extreme label-scarcity scenarios for Canary song
Generalizes well to Bengalese Finch song
Self-supervised embeddings enable unsupervised analysis
Abstract
Research in bioacoustics, neuroscience, and linguistics often uses birdsong as a proxy to acquire knowledge across diverse areas. This requires audio models to annotate and parse the birdsong. Developing such models requires precise, syllable-level annotated training data. Therefore, automated methods that reduce annotation costs are in demand. This work presents a data-efficient birdsong annotator called Residual Multi-Layer Perceptron Recurrent Neural Network. It then presents a three-stage training pipeline for developing reliable birdsong syllable detectors with minimal annotation. The first stage is self-supervised learning from unlabeled data. Two of the most successful pretraining paradigms are explored, namely, masked prediction and online clustering. The second stage is supervised training with effective data augmentation to produce a robust frame-level syllable detector for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnimal Vocal Communication and Behavior · Music and Audio Processing · Speech and Audio Processing
