NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics
David Robinson, Marius Miron, Masato Hagiwara, Benno Weck, Sara Keen, Milad Alizadeh, Gagan Narula, Matthieu Geist, Olivier Pietquin

TL;DR
NatureLM-audio is a pioneering audio-language foundation model tailored for bioacoustics, enabling improved zero-shot classification and generalization across diverse animal vocalization tasks, thereby supporting conservation and biodiversity efforts.
Contribution
It introduces the first bioacoustics-specific audio-language foundation model trained on curated data, demonstrating transfer learning from speech and music to bioacoustics and establishing new state-of-the-art results.
Findings
Sets new state-of-the-art on bioacoustics tasks
Shows effective transfer from speech and music models
Achieves promising generalization to unseen species
Abstract
Large language models (LLMs) prompted with text and audio have achieved state-of-the-art performance across various auditory tasks, including speech, music, and general audio, showing emergent abilities on unseen tasks. However, their potential has yet to be fully demonstrated in bioacoustics tasks, such as detecting animal vocalizations in large recordings, classifying rare and endangered species, and labeling context and behavior -- tasks that are crucial for conservation, biodiversity monitoring, and animal behavior studies. In this work, we present NatureLM-audio, the first audio-language foundation model specifically designed for bioacoustics. Our training dataset consists of carefully curated text-audio pairs spanning bioacoustics, speech, and music, designed to address the field's limited availability of annotated data. We demonstrate successful transfer of learned…
Peer Reviews
Decision·ICLR 2025 Poster
An incredible collection of datasets, and careful curation. A lot of ancillary code for use of the data in various learning tasks.
Unclear from the presentation if the authors intend to make the dataset widely available, and under what license.
1. Addresses a important topic from both the ML research community ( since audio and especially computational bioacoustics is a hard problem) and societal importance. 2. Collects a comprehensive training dataset and extends an existing evaluation benchmark with additional tasks. 3. The performance improvements compared to a model not trained on bioacoustics data (SALMONN) supports the claim that this domain is in the need for a own foundation model.
1. Soundness of results: Your presented results only show a minor improvement compared to BioLingual (which also presents zero shot results on BEANS, there numbers differ sometimes why?), so whats the benefit of your approach and more particularly does integrating a LLM has a benefit? Or is it the different training dataset? Or the audio encoder (BEATs vs. HTS-AT)? 2. No further details for replication of the experiments are given, e.g. pretrained models or the list of species which were hold ou
1. The introduction of NatureLM, the first audio-language model specifically designed for bioacoustics, represents a promising new direction for incorporating language models into biodiversity monitoring. 2. The development of the BEANS-Zero benchmark extends the original BEANS benchmark by introducing new tasks, such as call-type prediction, life-stage classification, individual counting, and open-ended audio captioning. These additions have the potential to advance bioacoustics research and e
1. Incorrect Terminology 1.1. The introduction describes BioLingual as self-supervised; however, the supervision is derived from text generated based on class labels. I recommend referring to it as supervised learning with language-based supervision for greater clarity and accuracy. 2.1. Both BioLingual and AVES are described in the paper as foundation models, but this classification may be misleading. BioLingual and AVES are trained on datasets with less than 2 million samples, while models
Code & Models
Videos
Taxonomy
TopicsAnimal Vocal Communication and Behavior · Music and Audio Processing · Diverse Musicological Studies
