Sound Agentic Science Requires Adversarial Experiments

Dionizije Fa; Marko Culjak

arXiv:2604.22080·cs.AI·May 21, 2026

Sound Agentic Science Requires Adversarial Experiments

Dionizije Fa, Marko Culjak

PDF

TL;DR

This paper argues that scientific claims generated by large language model agents should be rigorously tested through adversarial experiments to prevent false positives and ensure genuine validation.

Contribution

It introduces the concept of adversarial experiments as a standard for evaluating agent-produced scientific claims, emphasizing falsification over narrative persuasion.

Findings

01

Agents tend to produce plausible but unverified claims.

02

Falsification-first testing can improve scientific reliability.

03

Adversarial experiments help identify potential failures in claims.

Abstract

LLM-based agents are rapidly being adopted for scientific data analysis, automating tasks once limited by human time and expertise. This capability is often framed as an acceleration of discovery, but it also accelerates a familiar failure mode, the rapid production of plausible, endlessly revisable analyses that are easy to generate, effectively turning hypothesis space into candidate claims supported by selectively chosen analyses, optimized for publishable positives. Unlike software, scientific knowledge is not validated by the iterative accumulation of code and post hoc statistical support. A fluent explanation or a significant result on a single dataset is not verification. Because the missing evidence is a negative space, experiments and analyses that would have falsified the claim were never run or never published. We therefore propose that non-experimental claims produced with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.