AudioProtoPNet: An interpretable deep learning model for bird sound classification
Ren\'e Heinrich, Lukas Rauch, Bernhard Sick, Christoph Scholz

TL;DR
AudioProtoPNet is an interpretable deep learning model for bird sound classification that uses prototype learning to provide explanations and insights, outperforming previous models on multiple datasets.
Contribution
This paper introduces AudioProtoPNet, a novel interpretable deep learning approach for multi-label bird sound classification using prototype learning.
Findings
Outperforms state-of-the-art model Perch with 7.1% higher AUROC
Achieves 16.7% higher cmAP over Perch
Provides explanations for model decisions and insights into bird vocalizations
Abstract
Deep learning models have significantly advanced acoustic bird monitoring by being able to recognize numerous bird species based on their vocalizations. However, traditional deep learning models are black boxes that provide no insight into their underlying computations, limiting their usefulness to ornithologists and machine learning engineers. Explainable models could facilitate debugging, knowledge discovery, trust, and interdisciplinary collaboration. This study introduces AudioProtoPNet, an adaptation of the Prototypical Part Network (ProtoPNet) for multi-label bird sound classification. It is an inherently interpretable model that uses a ConvNeXt backbone to extract embeddings, with the classification layer replaced by a prototype learning classifier trained on these embeddings. The classifier learns prototypical patterns of each bird species' vocalizations from spectrograms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnimal Vocal Communication and Behavior · Music and Audio Processing · Diverse Musicological Studies
MethodsConvNeXt
