scPilot: Large Language Model Reasoning Toward Automated Single-Cell Analysis and Discovery

Yiming Gao; Zhen Wang; Jefferson Chen; Mark Antkowiak; Mengzhou Hu; JungHo Kong; Dexter Pratt; Jieyuan Liu; Enze Ma; Zhiting Hu; Eric P. Xing

arXiv:2602.11609·cs.AI·February 13, 2026

scPilot: Large Language Model Reasoning Toward Automated Single-Cell Analysis and Discovery

Yiming Gao, Zhen Wang, Jefferson Chen, Mark Antkowiak, Mengzhou Hu, JungHo Kong, Dexter Pratt, Jieyuan Liu, Enze Ma, Zhiting Hu, Eric P. Xing

PDF

Open Access

TL;DR

scPilot introduces a novel framework where large language models perform step-by-step reasoning directly on single-cell RNA-seq data, improving accuracy and interpretability in bioinformatics analysis.

Contribution

It is the first systematic approach to integrate LLMs with raw omics data for transparent and iterative single-cell analysis tasks.

Findings

01

11% accuracy improvement in cell-type annotation

02

30% reduction in trajectory graph-edit distance

03

Generation of interpretable reasoning traces

Abstract

We present scPilot, the first systematic framework to practice omics-native reasoning: a large language model (LLM) converses in natural language while directly inspecting single-cell RNA-seq data and on-demand bioinformatics tools. scPilot converts core single-cell analyses, i.e., cell-type annotation, developmental-trajectory reconstruction, and transcription-factor targeting, into step-by-step reasoning problems that the model must solve, justify, and, when needed, revise with new evidence. To measure progress, we release scBench, a suite of 9 expertly curated datasets and graders that faithfully evaluate the omics-native reasoning capability of scPilot w.r.t various LLMs. Experiments with o1 show that iterative omics-native reasoning lifts average accuracy by 11% for cell-type annotation and Gemini-2.5-Pro cuts trajectory graph-edit distance by 30% versus one-shot prompting, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSingle-cell and spatial transcriptomics · Cell Image Analysis Techniques · Domain Adaptation and Few-Shot Learning