Neurodata Without Boredom: Benchmarking Agentic AI for Data Reuse
Ling-Qi Zhang, Kristin Branson

TL;DR
This paper benchmarks agentic AI's ability to understand, reformat, and assist in neuroscience data reuse across diverse formats, highlighting its strengths and limitations in automating data integration tasks.
Contribution
It introduces a systematic evaluation of large language models on neuroscience data reformatting, revealing their capabilities and challenges in automating data integration.
Findings
Agents perform well on individual sub-tasks but struggle with end-to-end accuracy.
Agents often make errors that are hard to detect without ground-truth references.
Human-in-the-loop approaches remain essential for reliable data curation.
Abstract
Neuroscience data are highly fragmented across labs, formats, and experimental paradigms, and reuse often requires substantial manual effort. A persistent roadblock to data reuse and integration is the need to decipher bespoke and diverse data formatting choices. Common data formats have been proposed in response, but the field continues to struggle with a fundamental tension: formats flexible enough to accommodate diverse experiments are rarely descriptive enough to be self-explanatory, and sufficiently descriptive formats demand detailed documentation and curation effort that few labs can sustain. Agentic AI is a natural candidate to solve this problem: LLMs read code and text faster and with sustained attention to the low-level details humans tend to skim over. To measure how well agentic AI performs on this task, we selected eight recent papers studying large-scale mouse neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
