State Space Models for Bioacoustics: A Comparative Evaluation with Transformers
Chengyu Tang, Sanjeev Baskiyar

TL;DR
This paper introduces BioMamba, a Mamba-based model for wildlife sound analysis, which achieves comparable accuracy to Transformers while being more resource-efficient, suitable for environmental monitoring.
Contribution
The study presents BioMamba, a novel Mamba architecture for bioacoustics, demonstrating its effectiveness and efficiency against Transformer-based models.
Findings
BioMamba achieves similar performance to AVES on the BEANS benchmark.
BioMamba significantly reduces VRAM consumption compared to Transformer models.
BioMamba shows promise for real-world environmental monitoring applications.
Abstract
In this study, we evaluate the efficacy of the Mamba architecture bioacoustics by introducing BioMamba, a Mamba-based audio representation model for wildlife sounds. We pre-train a BioMamba using self-supervised learning on a large audio corpus and evaluate it on the BEANS benchmark across diverse classification and detection tasks. Compared to the state-of-the-art Transformer-based model (AVES), BioMamba achieves comparable performance while significantly reducing VRAM consumption. Our results demonstrate Mamba's potential as a computationally efficient alternative for real-world environmental monitoring.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
