Can Self-Supervised Neural Representations Pre-Trained on Human Speech   distinguish Animal Callers?

Eklavya Sarkar; Mathew Magimai.-Doss

arXiv:2305.14035·cs.LG·June 9, 2023·1 cites

Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?

Eklavya Sarkar, Mathew Magimai.-Doss

PDF

Open Access 1 Repo

TL;DR

This study demonstrates that self-supervised neural representations trained on human speech can effectively distinguish individual Marmoset callers, highlighting their cross-domain transferability to bio-acoustic analysis without additional training.

Contribution

It shows that SSL models pre-trained on human speech can be directly applied to bio-acoustic signals for caller identification, without fine-tuning.

Findings

01

SSL embeddings encode individual caller information

02

Models successfully distinguish Marmoset callers without fine-tuning

03

Cross-domain transferability of SSL representations is effective

Abstract

Self-supervised learning (SSL) models use only the intrinsic structure of a given signal, independent of its acoustic domain, to extract essential information from the input to an embedding space. This implies that the utility of such representations is not limited to modeling human speech alone. Building on this understanding, this paper explores the cross-transferability of SSL neural representations learned from human speech to analyze bio-acoustic signals. We conduct a caller discrimination analysis and a caller detection study on Marmoset vocalizations using eleven SSL models pre-trained with various pretext tasks. The results show that the embedding spaces carry meaningful caller information and can successfully distinguish the individual identities of Marmoset callers without fine-tuning. This demonstrates that representations pre-trained on human speech can be effectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

idiap/ssl-caller-detection
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnimal Vocal Communication and Behavior · Speech and Audio Processing · Speech Recognition and Synthesis