Agentic AI Scientists Are Not Built For Autonomous Scientific Discovery
Harshit Bisht, Vinay Kumar, Kevin Maik Jablonka, Mausam, N. M. Anoop Krishnan

TL;DR
This paper argues that current agentic AI scientists are not suitable for autonomous scientific discovery due to fundamental challenges in problem selection, knowledge gaps, output diversity, and benchmarking.
Contribution
It identifies key challenges and proposes foundational design changes needed to develop truly autonomous AI scientists.
Findings
Current AI scientists are limited by problem selection biases.
Large language models lack tacit procedural and failure knowledge.
Most benchmarks do not incorporate feedback from physical experiments.
Abstract
A growing body of work pursues AI scientists capable of end-to-end autonomous scientific discovery. This position paper argues that although they already function as co-scientists, agentic AI scientists are not built for autonomous scientific discovery. We identify the following challenges in building and deploying autonomous AI scientists: (1) Problem selection is influenced by the McNamara fallacy; (2) Agents are built on large language models (LLMs) whose training corpora omit tacit procedural and failure knowledge of laboratory practice; (3) Preference optimisation during post-training compresses output diversity toward consensus; and (4) Most scientific benchmarks measure single-turn prediction accuracy and lack feedback from physical experiments back to the computational model. These challenges are not just questions of scale and scaffolding; they require revisiting fundamental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
