Exploring bat song syllable representations in self-supervised audio encoders
Marianne de Heer Kloots, Mirjam Kn\"ornschild

TL;DR
This study investigates how self-supervised audio encoders trained on human speech represent bat song syllables, revealing that models trained on human sounds produce the most distinctive representations, aiding cross-species transfer learning.
Contribution
It demonstrates the potential of using human speech-trained models for analyzing bat vocalizations, advancing cross-species bioacoustic applications.
Findings
Models pre-trained on human speech produce distinctive syllable representations.
Cross-species transfer learning can be applied to bat bioacoustics.
Insights into out-of-distribution signal processing in audio models.
Abstract
How well can deep learning models trained on human-generated sounds distinguish between another species' vocalization types? We analyze the encoding of bat song syllables in several self-supervised audio encoders, and find that models pre-trained on human speech generate the most distinctive representations of different syllable types. These findings form first steps towards the application of cross-species transfer learning in bat bioacoustics, as well as an improved understanding of out-of-distribution signal processing in audio encoder models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnimal Vocal Communication and Behavior · Bat Biology and Ecology Studies · Marine animal studies overview
