APRES: An Agentic Paper Revision and Evaluation System

Bingchen Zhao; Jenny Zhang; Chenxi Whitehouse; Minqi Jiang; Michael Shvartsman; Abhishek Charnalia; Despoina Magka; Tatiana Shavrina; Derek Dunfield; Oisin Mac Aodha; Yoram Bachrach

arXiv:2603.03142·cs.CL·March 4, 2026

APRES: An Agentic Paper Revision and Evaluation System

Bingchen Zhao, Jenny Zhang, Chenxi Whitehouse, Minqi Jiang, Michael Shvartsman, Abhishek Charnalia, Despoina Magka, Tatiana Shavrina, Derek Dunfield, Oisin Mac Aodha, Yoram Bachrach

PDF

Open Access 3 Reviews

TL;DR

APRES leverages Large Language Models to automatically revise scientific papers based on an evaluation rubric, improving citation prediction accuracy and human preference, thereby enhancing scientific communication and impact.

Contribution

This paper introduces APRES, a novel LLM-powered system that automatically revises scientific manuscripts to improve their quality and future citation potential without altering core content.

Findings

01

Improves citation prediction accuracy by 19.6%.

02

Papers revised by APRES are preferred by experts 79% of the time.

03

Demonstrates LLMs can effectively stress-test manuscripts before submission.

Abstract

Scientific discoveries must be communicated clearly to realize their full potential. Without effective communication, even the most groundbreaking findings risk being overlooked or misunderstood. The primary way scientists communicate their work and receive feedback from the community is through peer review. However, the current system often provides inconsistent feedback between reviewers, ultimately hindering the improvement of a manuscript and limiting its potential impact. In this paper, we introduce a novel method APRES powered by Large Language Models (LLMs) to update a scientific papers text based on an evaluation rubric. Our automated method discovers a rubric that is highly predictive of future citation counts, and integrate it with APRES in an automated system that revises papers to enhance their quality and impact. Crucially, this objective should be met without altering the…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 4

Strengths

The paper proposes an original problem formulation and solution to the important problem of evaluating and improving the presentation of scientific papers. While prior work has studied aspects of this problem at a more component level, this paper takes a more end-to-end approach in a mostly ecologically valid setting with recent AI papers. The use of language models to automatically derive rubrics is understudied, and I appreciate this paper demonstrating the end-to-end effectiveness of this app

Weaknesses

This paper's primary original contribution lies in the problem definition, and it does not make a large advance in terms of the development of new optimization methods (this is fine). See the "Questions" section for a significant question about the experimental setup in the evaluation of the Phase 2 (rewriting) problem---there is lack of clarity, but if the evaluation metric is not held constant across the subplots in Figure 2 then that would call into question the internal validity of that expe

Reviewer 02Rating 4Confidence 3

Strengths

1. Problem impact/significance: The paper presents an end-to-end system for the impactful and timely problem of using LLMs to generate useful feedback for paper revision. 2. The idea of using an LLM-based agent to iteratively generate rubrics that are predictive of citation count based on text content appears to be novel, and the proposed MultiAIDE search method outperforms existing methods for predicting future citation count. 3. The human evaluation results scoring paper quality suggest tha

Weaknesses

1. As noted in the abstract, it is important that any paper revision system not revise core scientific content in a paper. While there are some constraints placed on the system to limit revision of such content, none of the evaluations of APRES and revised papers measure whether scientific content changed. This to me is one of the key weaknesses of this work, as a paper revision system that always changed a lot of scientific content such that the results looked more positive would be expected to

Reviewer 03Rating 2Confidence 4

Strengths

1. The idea of using LLMs to assist paper writing is promising. 2. Agentic paper revision and evaluation is intuitive to help researchers in writing papers.

Weaknesses

1. The core design of this paper uses the number of **citations** as a metric to improve given papers' presentation and readability, which is questionable. The authors cite Ante (2022) as support. However: a) The claim of Ante's paper is "Our analysis furthermore shows that higher readability scores significantly relates to the likelihood of articles not receiving any citations", which **contradicts the claim of this paper**. b) Even if readability (x) can increase or decrease scientific impact

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExpert finding and Q&A systems · Topic Modeling · Advanced Text Analysis Techniques