Paper Reconstruction Evaluation: Evaluating Presentation and Hallucination in AI-written Papers

Atsuyuki Miyai; Mashiro Toyooka; Zaiying Zhao; Kenta Watanabe; Toshihiko Yamasaki; Kiyoharu Aizawa

arXiv:2604.01128·cs.CL·April 2, 2026

Paper Reconstruction Evaluation: Evaluating Presentation and Hallucination in AI-written Papers

Atsuyuki Miyai, Mashiro Toyooka, Zaiying Zhao, Kenta Watanabe, Toshihiko Yamasaki, Kiyoharu Aizawa

PDF

1 Repo 1 Datasets

TL;DR

This paper presents PaperRecon, a framework for evaluating AI-generated papers' quality and hallucination risks, using a benchmark of recent papers and analyzing model trade-offs.

Contribution

It introduces a novel evaluation framework and benchmark for assessing AI-written papers, focusing on presentation quality and hallucination risks.

Findings

01

ClaudeCode has higher presentation quality but more hallucinations.

02

Codex produces fewer hallucinations but lower presentation quality.

03

Both models improve with advances, yet trade-offs persist.

Abstract

This paper introduces the first systematic evaluation framework for quantifying the quality and risks of papers written by modern coding agents. While AI-driven paper writing has become a growing concern, rigorous evaluation of the quality and potential risks of AI-written papers remains limited, and a unified understanding of their reliability is still lacking. We introduce Paper Reconstruction Evaluation (PaperRecon), an evaluation framework in which an overview (overview.md) is created from an existing paper, after which an agent generates a full paper based on the overview and minimal additional resources, and the result is subsequently compared against the original paper. PaperRecon disentangles the evaluation of the AI-written papers into two orthogonal dimensions, Presentation and Hallucination, where Presentation is evaluated using a rubric and Hallucination is assessed via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

agent4science-utokyo/PaperRecon
github

Datasets

hal-utokyo/PaperWrite-Bench
dataset· 100 dl
100 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.