scMamba: A Scalable Foundation Model for Single-Cell Multi-Omics Integration Beyond Highly Variable Feature Selection
Zhen Yuan, Shaoqing Jiao, Yihang Xiao, Jiajie Peng

TL;DR
scMamba is a scalable foundation model that integrates single-cell multi-omics data without prior feature selection, using a novel tokenization and contrastive learning approach to improve biological insights and downstream analyses.
Contribution
It introduces a new patch-based tokenization strategy and contrastive learning method for single-cell multi-omics integration, avoiding feature selection and preserving genomic information.
Findings
Outperforms existing methods in biological variation preservation
Enhances alignment across omics layers
Improves clustering, annotation, and trajectory inference
Abstract
The advent of single-cell multi-omics technologies has enabled the simultaneous profiling of diverse omics layers within individual cells. Integrating such multimodal data provides unprecedented insights into cellular identity, regulatory processes, and disease mechanisms. However, it remains challenging, as current methods often rely on selecting highly variable genes or peaks during preprocessing, which may inadvertently discard crucial biological information. Here, we present scMamba, a foundation model designed to integrate single-cell multi-omics data without the need for prior feature selection while preserving genomic positional information. scMamba introduces a patch-based cell tokenization strategy that treats genomics regions as words (tokens) and cells as sentences. Building upon the concept of state space duality, scMamba distills rich biological insights from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene Regulatory Network Analysis · Bioinformatics and Genomic Networks · Microbial Metabolic Engineering and Bioproduction
MethodsContrastive Learning · Feature Selection
