Towards grounded autonomous research: an end-to-end LLM mini research loop on published computational physics

Haonan Huang

arXiv:2604.12198·physics.comp-ph·April 15, 2026

Towards grounded autonomous research: an end-to-end LLM mini research loop on published computational physics

Haonan Huang

PDF

TL;DR

This paper demonstrates an autonomous research loop using large language models that can read, reproduce, critique, and extend computational physics papers, achieving significant insights and even producing a publishable comment.

Contribution

It introduces an end-to-end autonomous research framework applied to computational physics, capable of critical analysis and generating new scientific content without supervision.

Findings

01

The agent raised substantive concerns on 42% of 111 papers.

02

In-depth case study produced a publishable comment revising the original paper's conclusion.

03

The system autonomously generated a complete, typeset, and PDF-iterated comment.

Abstract

Recent autonomous LLM agents have demonstrated end-to-end automation of machine-learning research. Real-world physical science is intrinsically harder, requiring deep reasoning bounded by physical truth and, because real systems are too complex to study in isolation, almost always built on existing literature. We focus on the smallest meaningful unit of such research, a mini research loop in which an agent reads a paper, reproduces it, critiques it, and extends it. We test this loop in two complementary regimes: scale and depth. At scale, across 111 open-access computational physics papers, an agent autonomously runs the read-plan-compute-compare loop and, without being asked to critique, raises substantive concerns on ~42% of papers - 97.7% of which require execution to surface. In depth, for one Nature Communications paper on multiscale simulation of a 2D-material MOSFET, the agent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.