From Birdsong to Rumbles: Classifying Elephant Calls with Out-of-Species Embeddings
Christiaan M. Geldenhuys, Thomas R. Niesler

TL;DR
Pretrained acoustic embeddings from general audio models can effectively classify elephant calls with minimal fine-tuning, offering a practical solution for bioacoustic monitoring under resource constraints.
Contribution
This study demonstrates that out-of-species, pretrained embeddings can nearly match supervised models in classifying elephant vocalizations without domain-specific training.
Findings
Perch 2.0 achieves AUCs of 0.849 (African elephants) and 0.936 (Asian elephants).
Intermediate transformer layers encode sufficient information for classification.
Truncated models at the second layer retain performance with only 10% of parameters.
Abstract
We show that pretrained acoustic embeddings classify elephant vocalisations at a level approaching that of end-to-end supervised neural networks, without any fine-tuning of the embedding model. This result is of practical importance because annotated bioacoustic data are scarce and costly to obtain, leaving conventional supervised approaches prone to overfitting and to poor generalisation under domain shift. A broad range of embedding models drawn from general audio, speech, and bioacoustic domains is evaluated, all of which are either out-of-domain (containing no bioacoustic data) or out-of-species (containing no elephant call data). The embedding networks themselves remain fixed; only the lightweight downstream classifiers, which include a linear model and several small neural networks, are trained. Among the models considered, Perch 2.0 achieves the best cross-validated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
