PeerPrism: Peer Evaluation Expertise vs Review-writing AI

Soroush Sadeghian; Alireza Daqiq; Radin Cheraghi; Sajad Ebrahimi; Negar Arabzadeh; Ebrahim Bagheri

arXiv:2604.14513·cs.CL·April 17, 2026

PeerPrism: Peer Evaluation Expertise vs Review-writing AI

Soroush Sadeghian, Alireza Daqiq, Radin Cheraghi, Sajad Ebrahimi, Negar Arabzadeh, Ebrahim Bagheri

PDF

1 Repo

TL;DR

PeerPrism introduces a large-scale benchmark for evaluating how well LLM detection methods distinguish between human and AI contributions in peer reviews, emphasizing the complexity of hybrid human-AI collaboration.

Contribution

This work presents the first benchmark explicitly designed to disentangle idea provenance from text provenance in peer reviews, highlighting limitations of current detection methods.

Findings

01

Detection methods perform well on binary human vs. AI tasks.

02

Detectors often disagree when ideas are human but text is AI-generated.

03

Current methods conflate surface style with intellectual contribution.

Abstract

Large Language Models (LLMs) are increasingly used in scientific peer review, assisting with drafting, rewriting, expansion, and refinement. However, existing peer-review LLM detection methods largely treat authorship as a binary problem-human vs. AI-without accounting for the hybrid nature of modern review workflows. In practice, evaluative ideas and surface realization may originate from different sources, creating a spectrum of human-AI collaboration. In this work, we introduce PeerPrism, a large-scale benchmark of 20,690 peer reviews explicitly designed to disentangle idea provenance from text provenance. We construct controlled generation regimes spanning fully human, fully synthetic, and multiple hybrid transformations. This design enables systematic evaluation of whether detectors identify the origin of the surface text or the origin of the evaluative reasoning. We benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Reviewerly-Inc/PeerPrism
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.