Prototype Guided Post-pretraining for Single-Cell Representation Learning

Sachini Weerasekara; Natasha Darras; Sagar Kamarthi; Colles Price; Jacqueline Isaacs

arXiv:2605.07938·cs.LG·May 11, 2026

Prototype Guided Post-pretraining for Single-Cell Representation Learning

Sachini Weerasekara, Natasha Darras, Sagar Kamarthi, Colles Price, Jacqueline Isaacs

PDF

TL;DR

This paper introduces CellRefine, a post-pretraining method for single-cell models that uses marker-gene priors to improve downstream task performance, addressing generalization issues in gene expression data.

Contribution

CellRefine is a novel post-pretraining approach that enhances single-cell representation learning by incorporating structural priors, leading to significant performance gains.

Findings

01

CellRefine improves downstream performance by up to 15%.

02

It effectively refines the latent embedding manifold of cells.

03

The method addresses generalization issues in gene expression modeling.

Abstract

Single-cell representation learning (SCRL) from gene expression data offers a way to uncover the complex regulatory logic underlying cellular function. Inspired by large language models in natural language modeling, several single-cell pretrained models have recently been proposed that treat genes as tokens and cells as sentences. However, these models are fundamentally limited by the long-tailed nature of cell-type distributions and struggle to generalize under covariate shifts in gene expression data. While fine-tuning is often used to mitigate these issues, we observe that performance remains bounded. To address this challenge, we introduce CellRefine, a post-pretraining method that operates between the pretraining and fine-tuning stages of a single-cell foundation model. CellRefine uses a multi-faceted objective that incorporates marker-gene sets as structural priors to guide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.