State Space Models for Bioacoustics: A Comparative Evaluation with Transformers

Chengyu Tang; Sanjeev Baskiyar

arXiv:2512.03563·cs.SD·April 21, 2026

State Space Models for Bioacoustics: A Comparative Evaluation with Transformers

Chengyu Tang, Sanjeev Baskiyar

PDF

TL;DR

This paper introduces BioMamba, a Mamba-based model for wildlife sound analysis, which achieves comparable accuracy to Transformers while being more resource-efficient, suitable for environmental monitoring.

Contribution

The study presents BioMamba, a novel Mamba architecture for bioacoustics, demonstrating its effectiveness and efficiency against Transformer-based models.

Findings

01

BioMamba achieves similar performance to AVES on the BEANS benchmark.

02

BioMamba significantly reduces VRAM consumption compared to Transformer models.

03

BioMamba shows promise for real-world environmental monitoring applications.

Abstract

In this study, we evaluate the efficacy of the Mamba architecture bioacoustics by introducing BioMamba, a Mamba-based audio representation model for wildlife sounds. We pre-train a BioMamba using self-supervised learning on a large audio corpus and evaluate it on the BEANS benchmark across diverse classification and detection tasks. Compared to the state-of-the-art Transformer-based model (AVES), BioMamba achieves comparable performance while significantly reducing VRAM consumption. Our results demonstrate Mamba's potential as a computationally efficient alternative for real-world environmental monitoring.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.