Comparing Self-Supervised Learning Models Pre-Trained on Human Speech   and Animal Vocalizations for Bioacoustics Processing

Eklavya Sarkar; Mathew Magimai.-Doss

arXiv:2501.05987·cs.LG·January 22, 2025

Comparing Self-Supervised Learning Models Pre-Trained on Human Speech and Animal Vocalizations for Bioacoustics Processing

Eklavya Sarkar, Mathew Magimai.-Doss

PDF

Open Access 1 Repo

TL;DR

This study compares self-supervised learning models pre-trained on human speech and animal vocalizations to determine their effectiveness in bioacoustic tasks, finding minimal performance differences and highlighting the robustness of speech-pretrained models.

Contribution

It provides a comprehensive comparison of SSL models pre-trained on speech versus animal sounds for bioacoustic applications, showing that speech models are generally sufficient.

Findings

01

Bioacoustic pre-training offers marginal improvements over speech pre-training.

02

Fine-tuning on ASR tasks has mixed effects on bioacoustic performance.

03

Speech-pretrained SSL models are robust and often adequate without extensive fine-tuning.

Abstract

Self-supervised learning (SSL) foundation models have emerged as powerful, domain-agnostic, general-purpose feature extractors applicable to a wide range of tasks. Such models pre-trained on human speech have demonstrated high transferability for bioacoustic processing. This paper investigates (i) whether SSL models pre-trained directly on animal vocalizations offer a significant advantage over those pre-trained on speech, and (ii) whether fine-tuning speech-pretrained models on automatic speech recognition (ASR) tasks can enhance bioacoustic classification. We conduct a comparative analysis using three diverse bioacoustic datasets and two different bioacoustic tasks. Results indicate that pre-training on bioacoustic data provides only marginal improvements over speech-pretrained models, with comparable performance in most scenarios. Fine-tuning on ASR tasks yields mixed outcomes,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

idiap/ssl-human-animal
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnimal Vocal Communication and Behavior