scBench: Evaluating AI Agents on Single-Cell RNA-seq Analysis
Kenny Workman, Zhen Yang, Harihara Muralidharan, Aidan Abdulali, Hannah Le

TL;DR
scBench is a comprehensive benchmark for evaluating AI models on real-world single-cell RNA sequencing analysis tasks, highlighting the impact of platform variability and providing a tool for developing more accurate biological data analysis agents.
Contribution
Introduces scBench, a new benchmark with 394 problems for assessing AI agents on practical scRNA-seq workflows across multiple platforms and tasks.
Findings
AI models achieve 29-53% accuracy on scBench tasks.
Platform choice significantly impacts model performance.
Model accuracy drops over 40% on less-documented technologies.
Abstract
As single-cell RNA sequencing datasets grow in adoption, scale, and complexity, data analysis remains a bottleneck for many research groups. Although frontier AI agents have improved dramatically at software engineering and general data analysis, it remains unclear whether they can extract biological insight from messy, real-world single-cell datasets. We introduce scBench, a benchmark of 394 verifiable problems derived from practical scRNA-seq workflows spanning six sequencing platforms and seven task categories. Each problem provides a snapshot of experimental data immediately prior to an analysis step and a deterministic grader that evaluates recovery of a key biological result. Benchmark data on eight frontier models shows that accuracy ranges from 29-53%, with strong model-task and model-platform interactions. Platform choice affects accuracy as much as model choice, with 40+…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Cell Image Analysis Techniques · Cancer Genomics and Diagnostics
