ConnectomeBench: Can LLMs Proofread the Connectome?
Jeff Brown, Andrew Kirjner, Annika Vivekananthan, Ed Boyden

TL;DR
ConnectomeBench evaluates whether current large language models can automate the proofreading of neural connectome data, showing promising results in certain tasks but still lagging behind human experts.
Contribution
This paper introduces ConnectomeBench, a benchmark for assessing LLMs on connectome proofreading tasks, and provides the first comprehensive evaluation of multiple LLMs on this domain.
Findings
LLMs perform well in segment identification (52-82% accuracy).
LLMs achieve high accuracy in split error correction (75-85%).
Models struggle with merge error detection.
Abstract
Connectomics - the mapping of neural connections in an organism's brain - currently requires extraordinary human effort to proofread the data collected from imaging and machine-learning assisted segmentation. With the growing excitement around using AI agents to automate important scientific tasks, we explore whether current AI systems can perform multiple tasks necessary for data proofreading. We introduce ConnectomeBench, a multimodal benchmark evaluating large language model (LLM) capabilities in three critical proofreading tasks: segment type identification, split error correction, and merge error detection. Using expert annotated data from two large open-source datasets - a cubic millimeter of mouse visual cortex and the complete Drosophila brain - we evaluate proprietary multimodal LLMs including Claude 3.7/4 Sonnet, o4-mini, GPT-4.1, GPT-4o, as well as open source models like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Ferroelectric and Negative Capacitance Devices · Neurobiology of Language and Bilingualism
