Towards Leveraging Sequential Structure in Animal Vocalizations

Eklavya Sarkar; Mathew Magimai.-Doss

arXiv:2511.10190·cs.LG·November 14, 2025

Towards Leveraging Sequential Structure in Animal Vocalizations

Eklavya Sarkar, Mathew Magimai.-Doss

PDF

Open Access

TL;DR

This study explores the use of vector-quantized token sequences derived from self-supervised speech models to capture and utilize the sequential structure in animal vocalizations, improving classification tasks.

Contribution

It introduces a novel approach using vector quantization of speech model representations to encode temporal order in animal calls, which enhances bioacoustic analysis.

Findings

01

Token sequences can discriminate call-types and callers.

02

Sequence-based features improve classification performance.

03

Vector-quantized tokens hold promise for bioacoustic analysis.

Abstract

Animal vocalizations contain sequential structures that carry important communicative information, yet most computational bioacoustics studies average the extracted frame-level features across the temporal axis, discarding the order of the sub-units within a vocalization. This paper investigates whether discrete acoustic token sequences, derived through vector quantization and gumbel-softmax vector quantization of extracted self-supervised speech model representations can effectively capture and leverage temporal information. To that end, pairwise distance analysis of token sequences generated from HuBERT embeddings shows that they can discriminate call-types and callers across four bioacoustics datasets. Sequence classification experiments using $k$ -Nearest Neighbour with Levenshtein distance show that the vector-quantized token sequences yield reasonable call-type and caller…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnimal Vocal Communication and Behavior · Neuroendocrine regulation and behavior · Marine animal studies overview