StateXDiff: Cell State-Contextualized Multimodal Diffusion for Single-Cell Perturbation Prediction

Peiting Shi; Ningfeng Que; Xianzhe Huang; Xiaofei Wang; and Jianzhong Jeff Xi

arXiv:2605.16104·q-bio.GN·May 18, 2026

StateXDiff: Cell State-Contextualized Multimodal Diffusion for Single-Cell Perturbation Prediction

Peiting Shi, Ningfeng Que, Xianzhe Huang, Xiaofei Wang, and Jianzhong Jeff Xi

PDF

TL;DR

StateXDiff is a novel multimodal diffusion framework that predicts single-cell drug responses by integrating transcriptomic and protein data, improving generalization under challenging conditions.

Contribution

It introduces a cell state-contextualized multimodal diffusion model with disentangled representations and mechanism-aware drug templates for better prediction accuracy.

Findings

01

Outperforms existing models in unseen cell line predictions.

02

Effectively models combinatorial drug perturbations.

03

Enhances generalization to out-of-distribution conditions.

Abstract

Predicting drug-induced cellular state changes at single-cell resolution remains a central challenge in virtual cell modeling, particularly under out-of-distribution (OOD) conditions. Current approaches predominantly rely on RNA-based assays, which often fail to adequately capture the diverse cellular states underlying drug responses. Moreover, conditional distribution shifts and low signal-to-noise ratios frequently cause models to learn spurious correlations rather than genuine state transitions. To address these limitations, we introduce StateXDiff, a cell State-contextualized multimodal (X) Diffusion framework for predicting single-cell responses to drug perturbations. The framework operates sequentially: first, it learns a disentangled, multimodal representation of cellular state by integrating transcriptomic profiles with inferred protein features; second, it employs a conditional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.