Foundation Models for Bioacoustics -- a Comparative Review
Raphael Schwinger, Paria Vali Zadeh, Lukas Rauch, Mats Kurz, Tom Hauschild, Sam Lapp, Sven Tomforde

TL;DR
This review evaluates large-scale pretrained bioacoustic models, analyzing their transferability, training data, and architecture, and presents empirical results comparing their performance on biodiversity monitoring benchmarks.
Contribution
It provides a systematic overview and empirical comparison of bioacoustic foundation models, highlighting their transferability and guiding model selection for new tasks.
Findings
Perch 2.0 achieves highest BirdSet score and strong linear probing results.
BirdMAE excels among probing strategies on BirdSet.
Self-supervised models trained on AudioSet outperform specialized bird sound models.
Abstract
Automated bioacoustic analysis is essential for biodiversity monitoring and conservation, requiring advanced deep learning models that can adapt to diverse bioacoustic tasks. This article presents a comprehensive review of large-scale pretrained bioacoustic foundation models and systematically investigates their transferability across multiple bioacoustic classification tasks. We overview bioacoustic representation learning by analysing pretraining data sources and benchmarks. On this basis, we review bioacoustic foundation models, dissecting the models' training data, preprocessing, augmentations, architecture, and training paradigm. Additionally, we conduct an extensive empirical study of selected models on the BEANS and BirdSet benchmarks, evaluating generalisability under linear and attentive probing. Our experimental analysis reveals that Perch~2.0 achieves the highest BirdSet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
