Perch 2.0: The Bittern Lesson for Bioacoustics

Bart van Merri\"enboer; Vincent Dumoulin; Jenny Hamer; Lauren Harrell; Andrea Burns; Tom Denton

arXiv:2508.04665·cs.LG·January 6, 2026

Perch 2.0: The Bittern Lesson for Bioacoustics

Bart van Merri\"enboer, Vincent Dumoulin, Jenny Hamer, Lauren Harrell, Andrea Burns, Tom Denton

PDF

1 Video 4 Reviews

TL;DR

Perch 2.0 is an advanced bioacoustics model trained on multi-taxa data, achieving state-of-the-art results and demonstrating strong transfer learning capabilities across bird and marine species.

Contribution

It introduces a multi-taxa training approach with novel self-distillation and prototype-learning, enhancing bioacoustic classification and transfer learning performance.

Findings

01

State-of-the-art results on BirdSet and BEANS benchmarks

02

Outperforms marine models with minimal marine data

03

Fine-grained species classification is robust for pre-training

Abstract

Perch is a performant pre-trained model for bioacoustics. It was trained in supervised fashion, providing both off-the-shelf classification scores for thousands of vocalizing species as well as strong embeddings for transfer learning. In this new release, Perch 2.0, we expand from training exclusively on avian species to a large multi-taxa dataset. The model is trained with self-distillation using a prototype-learning classifier as well as a new source-prediction training criterion. Perch 2.0 obtains state-of-the-art performance on the BirdSet and BEANS benchmarks. It also outperforms specialized marine models on marine transfer learning tasks, despite having almost no marine training data. We present hypotheses as to why fine-grained species classification is a particularly robust pre-training task for bioacoustics.

Peer Reviews

Decision·ICLR 2026 Conference Desk Rejected Submission

Reviewer 01Rating 6Confidence 4

Strengths

- The paper combines supervised training, prototype-based distillation, and auxiliary objectives in a clear, effective design. - Experiments cover a wide range of benchmarks and are technically solid. - The model transfers well across domains while staying compact and efficient. - Strong performance under linear probing shows the embeddings are general and practical to use. - The model architecture is optimized to be employed in real-world systems, so as to be as light as possible.

Weaknesses

- Unclear contribution of components to performance: The paper introduces several methodological components (e.g., multi-source mixup, self-distillation, and an auxiliary source-prediction loss). However, their individual contributions are not clearly isolated, as the paper does not provide controlled ablation studies. In particular, the role of the windowing strategy and the handling of label noise across heterogeneous sources remains insufficiently explained. This raises concerns regarding rep

Reviewer 02Rating 4Confidence 4

Strengths

- The paper is well-written and easy to follow. - The model achieves state-of-the-art results on multiple datasets.

Weaknesses

While the developed model shows strong performance, the question remains: what contributes to its strong performance? As multiple changes were made compared with the BirdSet and Perch 1.0 baselines, it is hard to assess the importance of each individual change. Most importantly, it is unclear how much the additional training data contributes to the performance increase relative to the architectural changes and auxiliary losses. Ablation studies could help clarify this. This is especially importa

Reviewer 03Rating 2Confidence 4

Strengths

- **Clear presentation**: The paper is well-written and easy to follow. - **Comprehensive evaluation framework**: The inclusion of different model selection tasks in the evaluation is nice, as it helps to identify both strengths and limitations of the model. - **Pragmatic focus on supervised learning**: The decision to focus on supervised learning rather than following the current trend toward self-supervised methods is commendable. This work demonstrates that supervised approaches remain compet

Weaknesses

**Unclear novelty and insufficient differentiation from prior work** The authors claim several contributions, including a novel mixup procedure, a self-distillation process, and a self-supervised auxiliary loss. However, the paper lacks clarity in distinguishing what constitutes genuinely novel contributions versus adaptations of existing techniques. For example, while the authors propose generalizing mixup to more than two components, they do not adequately discuss related work that already ex

Reviewer 04Rating 4Confidence 4

Strengths

1. *Clarity and motivation*: The paper is well-written and very easy to follow. The work is well-motivated by addressing real-world challenges faced by practitioners, such as the need for strong, generalizable embeddings from smaller models that do not require extensive fine-tuning. 2. *Methodological combination:* The work combines existing techniques (self-distillation, source prediction, prototype learning) into a single training framework. This combination is well-suited to the problem of fi

Weaknesses

While the proposed method combination is interesting and the results are strong, the paper is limited by a lack of empirical validation. The core issue is an absence of ablation studies, which makes it impossible to attribute the performance gains to the specific contributions claimed by the authors (method-based, data-based, etc.). **1. Confounded contributions and lack of ablations:** The core weakness is that the paper simultaneously introduces multiple changes to the previous model (a larg

Videos

Can AI help to save endangered birds?· youtube