BarcodeMamba: State Space Models for Biodiversity Analysis
Tiancheng Gao, Graham W. Taylor

TL;DR
BarcodeMamba introduces a state space model-based foundation for DNA barcode analysis, outperforming previous models like BarcodeBERT in species identification accuracy with fewer parameters, especially in recognizing unseen species.
Contribution
We developed BarcodeMamba, a novel state space model for DNA barcodes, demonstrating superior performance and efficiency over existing models like BarcodeBERT in biodiversity analysis.
Findings
BarcodeMamba achieves 99.2% accuracy on species-level identification for seen species.
It attains 70.2% genus-level accuracy for unseen species with fewer parameters.
BarcodeMamba outperforms BarcodeBERT in both accuracy and parameter efficiency.
Abstract
DNA barcodes are crucial in biodiversity analysis for building automatic identification systems that recognize known species and discover unseen species. Unlike human genome modeling, barcode-based invertebrate identification poses challenges in the vast diversity of species and taxonomic complexity. Among Transformer-based foundation models, BarcodeBERT excelled in species-level identification of invertebrates, highlighting the effectiveness of self-supervised pretraining on barcode-specific datasets. Recently, structured state space models (SSMs) have emerged, with a time complexity that scales sub-quadratically with the context length. SSMs provide an efficient parameterization of sequence modeling relative to attention-based architectures. Given the success of Mamba and Mamba-2 in natural language, we designed BarcodeMamba, a performant and efficient foundation model for DNA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpecies Distribution and Climate Change · Fish Ecology and Management Studies · Aquatic Invertebrate Ecology and Behavior
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
