Comparing Self-Supervised Learning Models Pre-Trained on Human Speech and Animal Vocalizations for Bioacoustics Processing
Eklavya Sarkar, Mathew Magimai.-Doss

TL;DR
This study compares self-supervised learning models pre-trained on human speech and animal vocalizations to determine their effectiveness in bioacoustic tasks, finding minimal performance differences and highlighting the robustness of speech-pretrained models.
Contribution
It provides a comprehensive comparison of SSL models pre-trained on speech versus animal sounds for bioacoustic applications, showing that speech models are generally sufficient.
Findings
Bioacoustic pre-training offers marginal improvements over speech pre-training.
Fine-tuning on ASR tasks has mixed effects on bioacoustic performance.
Speech-pretrained SSL models are robust and often adequate without extensive fine-tuning.
Abstract
Self-supervised learning (SSL) foundation models have emerged as powerful, domain-agnostic, general-purpose feature extractors applicable to a wide range of tasks. Such models pre-trained on human speech have demonstrated high transferability for bioacoustic processing. This paper investigates (i) whether SSL models pre-trained directly on animal vocalizations offer a significant advantage over those pre-trained on speech, and (ii) whether fine-tuning speech-pretrained models on automatic speech recognition (ASR) tasks can enhance bioacoustic classification. We conduct a comparative analysis using three diverse bioacoustic datasets and two different bioacoustic tasks. Results indicate that pre-training on bioacoustic data provides only marginal improvements over speech-pretrained models, with comparable performance in most scenarios. Fine-tuning on ASR tasks yields mixed outcomes,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnimal Vocal Communication and Behavior
