When AI reviews science: Can we trust the referee?
Jialiang Wang, Yuchen Liu, Hang Xu, Kaichun Hu, Shimin Di, Wangze Ni, Linan Yue, Min-Ling Zhang, Kui Ren, Lei Chen

TL;DR
This paper critically examines the reliability of AI in peer review, identifying vulnerabilities and proposing a taxonomy of attacks through empirical analysis of LLM-based referees on ICLR submissions.
Contribution
It introduces a comprehensive attack taxonomy on AI peer review and provides an empirical audit with causal analysis on real submissions to assess AI review reliability.
Findings
AI reviews are susceptible to prompt injections and adversarial manipulation.
Experimental probes reveal biases and vulnerabilities affecting review scores.
The study offers a baseline for evaluating and improving AI peer review systems.
Abstract
The volume of scientific submissions continues to climb, outpacing the capacity of qualified human referees and stretching editorial timelines. At the same time, modern large language models (LLMs) offer impressive capabilities in summarization, fact checking, and literature triage, making the integration of AI into peer review increasingly attractive -- and, in practice, unavoidable. Yet early deployments and informal adoption have exposed acute failure modes. Recent incidents have revealed that hidden prompt injections embedded in manuscripts can steer LLM-generated reviews toward unjustifiably positive judgments. Complementary studies have also demonstrated brittleness to adversarial phrasing, authority and length biases, and hallucinated claims. These episodes raise a central question for scholarly communication: when AI reviews science, can we trust the AI referee? This paper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
