CellForge: Agentic Design of Virtual Cell Models
Xiangru Tang, Zhuoyun Yu, Jiapeng Chen, Yan Cui, Daniel Shao, Weixu Wang, Fang Wu, Yuchen Zhuang, Wenqi Shi, Zhi Huang, Arman Cohan, Xihong Lin, Fabian Theis, Smita Krishnaswamy, Mark Gerstein

TL;DR
CellForge is a multi-agent framework that autonomously designs and synthesizes neural network architectures tailored to single-cell datasets, enabling innovative and executable models for cellular response prediction.
Contribution
It introduces a novel multi-agent collaboration approach for autonomous neural network architecture discovery in computational biology.
Findings
CellForge produces competitive models across diverse datasets.
The framework enables emergence of new architectural components.
Multi-agent collaboration fosters methodological innovation.
Abstract
Virtual cell modeling aims to predict cellular responses to diverse perturbations but faces challenges from biological complexity, multimodal data heterogeneity, and the need for interdisciplinary expertise. We introduce CellForge, a multi-agent framework that autonomously designs and synthesizes neural network architectures tailored to specific single-cell datasets and perturbation tasks. Given raw multi-omics data and task descriptions, CellForge discovers candidate architectures through collaborative reasoning among specialized agents, then generates executable implementations. Our core contribution is the framework itself: showing that multi-agent collaboration mechanisms - rather than manual human design or single-LLM prompting - can autonomously produce executable, high-quality computational methods. This approach goes beyond conventional hyperparameter tuning by enabling entirely…
Peer Reviews
Decision·ICLR 2026 Conference Desk Rejected Submission
This paper presents an original multi-agent system for single-cell perturbation prediction in computational biology, overcoming the prior reliance on human intervention and demonstrating highly competitive performance. Moreover, the paper is of high quality, providing a comprehensive and clear exposition of technical details, advancing beyond isolated task execution to enable end-to-end autonomous research workflows, and thereby demonstrating notable significance.
This paper lacks sufficient concrete comparative analysis with methods from other works, for exampl, works such as CellAgent, C2S-Scale appear to achieved similar automated single-cell data analysis, the paper should thoroughly discuss the differences from these approaches to better highlight its own contributions and value.
The idea is original and the findings are significant. The problem formulation is clever.
The paper has a significant number of typos, including in the abstract (e.g., componentssuch). As the authors acknowledge, performance and outcome significantly varies across runs. While the framework *can* discover models that match or exceed hand-designed ones, there are no guarantees. The multi-agent framework is computationally expensive and significantly slower than other approaches.
The paper is clearly written, well-structured, and easy to follow, with an extensive benchmark.
In general, the premise that different perturbation datasets require distinct architectures is contentious. Most perturbational single-cell datasets are generated using comparable experimental platforms, and recent models for perturbation prediction — such as **State** — do not report substantial performance gaps across datasets. Moreover, I have investigated one of the reasoning traces in the model repository (`example_report.json` and `example_analysis.json`, as well as examples in the appendi
Videos
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Cell Image Analysis Techniques · Gene Regulatory Network Analysis
