SpatialBench: Can Agents Analyze Real-World Spatial Biology Data?
Kenny Workman, Zhen Yang, Harihara Muralidharan, Hannah Le

TL;DR
SpatialBench is a comprehensive benchmark designed to evaluate AI agents' ability to analyze complex, real-world spatial transcriptomics data, highlighting current limitations and guiding future improvements.
Contribution
The paper introduces SpatialBench, a benchmark with 146 problems for assessing AI agents' performance on practical spatial biology analysis tasks.
Findings
Base model accuracy remains low (20-38%) across models.
Model performance varies significantly with task, platform, and design choices.
Design elements like prompts and control flow greatly impact results.
Abstract
Spatial transcriptomics assays are rapidly increasing in scale and complexity, making computational analysis a major bottleneck in biological discovery. Although frontier AI agents have improved dramatically at software engineering and general data analysis, it remains unclear whether they can extract biological insight from messy, real-world spatial datasets. We introduce SpatialBench, a benchmark of 146 verifiable problems derived from practical spatial analysis workflows spanning five spatial technologies and seven task categories. Each problem provides a snapshot of experimental data immediately prior to an analysis step and a deterministic grader that evaluates recovery of a key biological result. Benchmark data on frontier models shows that base model accuracy remains low (20-38% across model families), with strong model-task and model-platform interactions. Harness design has a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Cell Image Analysis Techniques · Modular Robots and Swarm Intelligence
