How Far Are We From True Auto-Research?

Zhengxin Zhang; Ning Wang; Sainyam Galhotra; Claire Cardie

arXiv:2605.19156·cs.AI·May 20, 2026

How Far Are We From True Auto-Research?

Zhengxin Zhang, Ning Wang, Sainyam Galhotra, Claire Cardie

PDF

TL;DR

This study introduces ResearchArena, a framework for autonomous research agents to generate full papers, revealing that while manuscript quality appears promising, actual experimental rigor remains a significant challenge preventing acceptance at top venues.

Contribution

The paper presents ResearchArena, a minimal scaffold enabling agents to conduct autonomous research, and provides a comprehensive evaluation of their output's quality and experimental validity.

Findings

01

Manuscript-only reviews favor certain agents, but do not reflect actual research quality.

02

Artifact-aware peer review exposes major issues in experimental rigor.

03

None of the agent-generated papers meet top-tier acceptance standards.

Abstract

Recent auto-research systems can produce complete papers, but feasibility is not the same as quality, and the field still lacks a systematic study of how good agent-generated papers actually are. We introduce ResearchArena, a minimal scaffold that lets off-the-shelf agents (Claude Code using Opus 4.6, Codex using GPT-5.4, and Kimi Code using K2.5) carry out the full research loop themselves (ideation, experimentation, paper writing, self-refinement) under only lightweight guidance. Across 13 computer science seeds and 3 trials per agent-domain pair, ResearchArena yields 117 agent-generated papers, each evaluated under three complementary lenses: a manuscript-only reviewer (SAR), an artifact-aware peer review (PR) in which agents inspect the workspace alongside the manuscript, and an human conducted meta-review. Under SAR alone the picture is optimistic: Claude Code obtains the highest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.