BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics
Lukas Rauch, Raphael Schwinger, Moritz Wirth, Ren\'e Heinrich, Denis Huseljic, Marek Herde, Jonas Lange, Stefan Kahl, Bernhard Sick, Sven Tomforde, Christoph Scholz

TL;DR
BirdSet is a comprehensive, large-scale dataset for avian audio classification, significantly surpassing existing datasets in size and diversity, enabling advanced research in bioacoustics and machine learning.
Contribution
Introduces BirdSet, a large-scale, versatile benchmark dataset for avian bioacoustics, with extensive labeled data and evaluation scenarios, hosted openly for the research community.
Findings
BirdSet exceeds AudioSet in size and diversity.
Benchmarking six models reveals strengths and weaknesses in different scenarios.
Dataset and code are publicly available for reproducibility.
Abstract
Deep learning (DL) has greatly advanced audio classification, yet the field is limited by the scarcity of large-scale benchmark datasets that have propelled progress in other domains. While AudioSet is a pivotal step to bridge this gap as a universal-domain dataset, its restricted accessibility and limited range of evaluation use cases challenge its role as the sole resource. Therefore, we introduce BirdSet, a large-scale benchmark dataset for audio classification focusing on avian bioacoustics. BirdSet surpasses AudioSet with over 6,800 recording hours () from nearly 10,000 classes () for training and more than 400 hours () across eight strongly labeled evaluation datasets. It serves as a versatile resource for use cases such as multi-label classification, covariate shift or self-supervised learning. We benchmark six well-known DL…
Peer Reviews
Decision·ICLR 2025 Spotlight
Careful curation of the dataset - types of collections, varying types of birdcalls from same species; includes both soundscapes, and individual recordings; comparisons with other datasets. Inclusion of benchmark code for future researchers to use as a baseline. Permissive licensing A large scale project
too many uncommon acronyms which require a reader to keep going back and forth - such shortening of the length was unnecessary. The acronyms are used in the figures as well. Figures and captions are supposed to stand on their own.
1. Originality BirdSet represents a novel contribution to multi-label audio classification. With close to 10,000 classes BirdSet provides a benchmark to develop scalable methods capable of handling extreme class diversity with large imbalance. BirdSet also addresses critical machine learning challenges such as covariate shift, where the testing distribution diverges from the training distribution reflecting real-world environmental shifts in field data. 2. Clarity The paper is clear and well
I didn’t find any weaknesses in this paper.
1. Well written and easy to read. 2. Thoroughly explains the challenges experienced not only in avian bioacoustics, such as covariate shift and mismatch in focal and soundscape recordings, but also in curating, and developing a dataset of such a size and scale. 3. The pain points addressed in the paper are very real: poor availability and accessibility of AudioSet, lack of a unified benchmark suite for evaluating segment and event-based bioacoustics tasks, and mismatch between train and test ti
To me, the paper, in several places, tries to pose BirdSet as a replacement for AudioSet and that it should be the exemplary benchmark for evaluating audio classification models, with statements such as "Avian bioacoustics exemplifies challenges in audio classification...", and how curated datasets like AudioSet and ESC-50 do not represent real-world complexities. Pain points mentioned w.r.t. AudioSet are all very real, but AudioSet is a much broader dataset than BirdSet. Several people, in indu
Code & Models
Videos
Taxonomy
TopicsAnimal Vocal Communication and Behavior · Marine animal studies overview · Underwater Acoustics Research
MethodsFragmentation
