eccDNAMamba: A Pre-Trained Model for Ultra-Long eccDNA Sequence Analysis
Zhenke Liu, Jien Li, Ziqi Zhang

TL;DR
eccDNAMamba is a novel pre-trained model designed for efficient, full-length analysis of circular extrachromosomal DNA sequences, enabling better understanding of their regulatory roles in cancer.
Contribution
It introduces the first bidirectional state-space encoder for circular DNA, supporting sequences up to 200 Kbp with linear-time complexity.
Findings
Achieves strong classification performance on real datasets
Scales to sequences up to 200 Kbp in length
Provides a robust framework for circular genome modeling
Abstract
Extrachromosomal circular DNA (eccDNA) plays key regulatory roles and contributes to oncogene overexpression in cancer through high-copy amplification and long-range interactions. Despite advances in modeling, no pre-trained models currently support full-length circular eccDNA for downstream analysis. Existing genomic models are either limited to single-nucleotide resolution or hindered by the inefficiency of the quadratic attention mechanism. Here, we introduce eccDNAMamba, the first bidirectional state-space encoder tailored for circular DNA sequences. It combines forward and reverse passes for full-context representation learning with linear-time complexity, and preserves circular structure through a novel augmentation strategy. Tested on two real-world datasets, eccDNAMamba achieves strong classification performance and scales to sequences up to 200 Kbp, offering a robust and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEnvironmental DNA in Biodiversity Studies · Genomics and Phylogenetic Studies · Molecular Biology Techniques and Applications
