Foundation Models for Bioacoustics -- a Comparative Review

Raphael Schwinger; Paria Vali Zadeh; Lukas Rauch; Mats Kurz; Tom Hauschild; Sam Lapp; Sven Tomforde

arXiv:2508.01277·cs.SD·March 31, 2026

Foundation Models for Bioacoustics -- a Comparative Review

Raphael Schwinger, Paria Vali Zadeh, Lukas Rauch, Mats Kurz, Tom Hauschild, Sam Lapp, Sven Tomforde

PDF

TL;DR

This review evaluates large-scale pretrained bioacoustic models, analyzing their transferability, training data, and architecture, and presents empirical results comparing their performance on biodiversity monitoring benchmarks.

Contribution

It provides a systematic overview and empirical comparison of bioacoustic foundation models, highlighting their transferability and guiding model selection for new tasks.

Findings

01

Perch 2.0 achieves highest BirdSet score and strong linear probing results.

02

BirdMAE excels among probing strategies on BirdSet.

03

Self-supervised models trained on AudioSet outperform specialized bird sound models.

Abstract

Automated bioacoustic analysis is essential for biodiversity monitoring and conservation, requiring advanced deep learning models that can adapt to diverse bioacoustic tasks. This article presents a comprehensive review of large-scale pretrained bioacoustic foundation models and systematically investigates their transferability across multiple bioacoustic classification tasks. We overview bioacoustic representation learning by analysing pretraining data sources and benchmarks. On this basis, we review bioacoustic foundation models, dissecting the models' training data, preprocessing, augmentations, architecture, and training paradigm. Additionally, we conduct an extensive empirical study of selected models on the BEANS and BirdSet benchmarks, evaluating generalisability under linear and attentive probing. Our experimental analysis reveals that Perch~2.0 achieves the highest BirdSet…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.