Benchmarking virtual cell models for in-the-wild perturbation response
Xinjie Mao, Songming Zhang, Qianhong Wen, Xiangyu Wen, Kedu Jin, Hao Wu, Shuizhou Chen, Yuqiang Li, Lei Bai, Qi Liu, Ning Ding, Siqi Sun, Zhangyang Gao

TL;DR
This paper introduces a standardized benchmarking framework for virtual cell models, evaluating their robustness and generalization in realistic, complex biological scenarios to improve their practical utility.
Contribution
It presents a modular evaluation setup that assesses virtual cell models under challenging, real-world conditions, revealing their limitations and guiding future improvements.
Findings
Model performance varies significantly across different biological contexts.
Performance drops under strict evaluation conditions, indicating limited robustness.
Different metrics lead to different model rankings, affecting interpretation.
Abstract
Virtual cell (VC) models aim to predict cellular responses to any perturbations in silico and have emerged as a promising approach for drug discovery and precision medicine. Yet, a clear gap still remains: while models routinely reported impressive results on standard benchmarks, it is unclear whether their predictions are truly meaningful in practice. This is mainly due to limitations in current evaluation setups, which are often overly simplified or inconsistent, and do not reflect the complexity and variability of real biological systems. Here, we introduce a standardized and modular benchmarking framework for virtual cell prediction. Our framework evaluates diverse models under in-the-wild challenging scenarios, including unseen cell contexts, unseen perturbations, and cross-dataset generalization, which better reflect practical applications. Our analysis shows that model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
