scpFormer: A Foundation Model for Unified Representation and Integration of the Single-Cell Proteomics
Qifeng Zhou, Lei Yu, Yuzhi Guo, Yuwei Miao, Hehuan Ma, Wenliang Zhong, Lin Xu, Junzhou Huang

TL;DR
scpFormer is a transformer-based model trained on over 390 million cells that enables unified, panel-agnostic analysis of single-cell proteomics data, improving integration, clustering, and biomarker discovery.
Contribution
It introduces a novel continuous, sequence-anchored tokenization and open-vocabulary architecture for single-cell proteomics, enhancing data integration and biological interpretation.
Findings
Performs competitively in large-scale batch integration and clustering.
Enables in silico panel expansion for better biological manifold reconstruction.
Supports transfer learning to bulk-omics for applications like drug response prediction.
Abstract
The integration of single-cell proteomic data is often hindered by the fragmented nature of targeted antibody panels. To address this limitation, we introduce scpFormer, a transformer-based foundation model designed for single-cell proteomics. Pre-trained on over 390 million cells, scpFormer replaces standard index-based tokenization with a continuous, sequence-anchored approach. By combining Evolutionary Scale Modeling (ESM) with value-aware expression embeddings, it dynamically maps variable panels into a shared semantic space without artificial discretization. We demonstrate that scpFormer generates global cell representations that perform competitively in large-scale batch integration and unsupervised clustering. Moreover, its open-vocabulary architecture facilitates in silico panel expansion, assisting in the reconstruction of biological manifolds in sparse clinical datasets.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
