Transferable Models for Bioacoustics with Human Language Supervision

David Robinson; Adelaide Robinson; Lily Akrapongpisak

arXiv:2308.04978·cs.LG·August 10, 2023

Transferable Models for Bioacoustics with Human Language Supervision

David Robinson, Adelaide Robinson, Lily Akrapongpisak

PDF

Open Access 1 Repo

TL;DR

BioLingual is a contrastive language-audio model trained on a large bioacoustic dataset, enabling zero-shot species identification, flexible querying, and state-of-the-art performance on animal sound tasks, advancing ecological monitoring.

Contribution

The paper introduces BioLingual, a novel contrastive language-audio model trained on AnimalSpeak, enabling broad species identification and flexible natural language queries in bioacoustics.

Findings

01

Identifies over a thousand species across taxa.

02

Achieves state-of-the-art results on nine bioacoustic tasks.

03

Enables zero-shot and text-based retrieval of animal sounds.

Abstract

Passive acoustic monitoring offers a scalable, non-invasive method for tracking global biodiversity and anthropogenic impacts on species. Although deep learning has become a vital tool for processing this data, current models are inflexible, typically cover only a handful of species, and are limited by data scarcity. In this work, we propose BioLingual, a new model for bioacoustics based on contrastive language-audio pretraining. We first aggregate bioacoustic archives into a language-audio dataset, called AnimalSpeak, with over a million audio-caption pairs holding information on species, vocalization context, and animal behavior. After training on this dataset to connect language and audio representations, our model can identify over a thousand species' calls across taxa, complete bioacoustic tasks zero-shot, and retrieve animal vocalization recordings from natural text queries. When…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

david-rx/biolingual
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnimal Vocal Communication and Behavior · Music and Audio Processing · Diverse Musicological Studies