Pre-review to Peer review: Pitfalls of Automating Reviews using Large Language Models

Akhil Pandey Akella; Harish Varma Siravuri; Shaurya Rohatgi

arXiv:2512.22145·cs.DL·December 30, 2025

Pre-review to Peer review: Pitfalls of Automating Reviews using Large Language Models

Akhil Pandey Akella, Harish Varma Siravuri, Shaurya Rohatgi

PDF

Open Access

TL;DR

This study evaluates the potential and risks of using large language models to automate peer reviews, finding they can assist in pre-review screening but exhibit misalignment and overconfidence issues compared to human reviewers.

Contribution

The paper provides an experimental analysis of frontier open-weight LLMs for peer review, highlighting their utility and pitfalls, and introduces an open-source dataset for further research.

Findings

01

LLMs show weak correlation with human reviews (0.15)

02

Models tend to overestimate review quality by 3-5 points

03

LLM reviews correlate more with post-publication metrics than human scores

Abstract

Large Language Models are versatile general-task solvers, and their capabilities can truly assist people with scholarly peer review as \textit{pre-review} agents, if not as fully autonomous \textit{peer-review} agents. While incredibly beneficial, automating academic peer-review, as a concept, raises concerns surrounding safety, research integrity, and the validity of the academic peer-review process. The majority of the studies performing a systematic evaluation of frontier LLMs generating reviews across science disciplines miss the mark on addressing the alignment/misalignment of reviews along with the utility of LLM generated reviews when compared against publication outcomes such as \textbf{Citations}, \textbf{Hit-papers}, \textbf{Novelty}, and \textbf{Disruption}. This paper presents an experimental study in which we gathered ground-truth reviewer ratings from OpenReview and used…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExpert finding and Q&A systems · Academic integrity and plagiarism · Academic Publishing and Open Access