On the Utility of Speech and Audio Foundation Models for Marmoset Call   Analysis

Eklavya Sarkar; Mathew Magimai.-Doss

arXiv:2407.16417·cs.SD·July 25, 2024

On the Utility of Speech and Audio Foundation Models for Marmoset Call Analysis

Eklavya Sarkar, Mathew Magimai.-Doss

PDF

1 Repo

TL;DR

This study evaluates the effectiveness of speech and audio foundation models in classifying marmoset calls, finding that higher bandwidth models and pre-training on speech or general audio improve classification accuracy over traditional spectral methods.

Contribution

It systematically compares speech and general audio pre-trained models across different bandwidths for marmoset call classification, highlighting their utility and limitations.

Findings

01

Higher bandwidth models improve classification performance.

02

Pre-training on speech and general audio yields similar results.

03

Models outperform traditional spectral baseline methods.

Abstract

Marmoset monkeys encode vital information in their calls and serve as a surrogate model for neuro-biologists to understand the evolutionary origins of human vocal communication. Traditionally analyzed with signal processing-based features, recent approaches have utilized self-supervised models pre-trained on human speech for feature extraction, capitalizing on their ability to learn a signal's intrinsic structure independently of its acoustic domain. However, the utility of such foundation models remains unclear for marmoset call analysis in terms of multi-class classification, bandwidth, and pre-training domain. This study assesses feature representations derived from speech and general audio domains, across pre-training bandwidths of 4, 8, and 16 kHz for marmoset call-type and caller classification tasks. Results show that models with higher bandwidth improve performance, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

idiap/speech-utility-bioacoustics
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.