Transferable Models for Bioacoustics with Human Language Supervision
David Robinson, Adelaide Robinson, Lily Akrapongpisak

TL;DR
BioLingual is a contrastive language-audio model trained on a large bioacoustic dataset, enabling zero-shot species identification, flexible querying, and state-of-the-art performance on animal sound tasks, advancing ecological monitoring.
Contribution
The paper introduces BioLingual, a novel contrastive language-audio model trained on AnimalSpeak, enabling broad species identification and flexible natural language queries in bioacoustics.
Findings
Identifies over a thousand species across taxa.
Achieves state-of-the-art results on nine bioacoustic tasks.
Enables zero-shot and text-based retrieval of animal sounds.
Abstract
Passive acoustic monitoring offers a scalable, non-invasive method for tracking global biodiversity and anthropogenic impacts on species. Although deep learning has become a vital tool for processing this data, current models are inflexible, typically cover only a handful of species, and are limited by data scarcity. In this work, we propose BioLingual, a new model for bioacoustics based on contrastive language-audio pretraining. We first aggregate bioacoustic archives into a language-audio dataset, called AnimalSpeak, with over a million audio-caption pairs holding information on species, vocalization context, and animal behavior. After training on this dataset to connect language and audio representations, our model can identify over a thousand species' calls across taxa, complete bioacoustic tasks zero-shot, and retrieve animal vocalization recordings from natural text queries. When…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnimal Vocal Communication and Behavior · Music and Audio Processing · Diverse Musicological Studies
