Breaking the Reviewer: Assessing the Vulnerability of Large Language Models in Automated Peer Review Under Textual Adversarial Attacks

Tzu-Ling Lin; Wei-Chih Chen; Teng-Fang Hsiao; Hou-I Liu; Ya-Hsin Yeh; Yu Kai Chan; Wen-Sheng Lien; Po-Yen Kuo; Philip S. Yu; Hong-Han Shuai

arXiv:2506.11113·cs.CL·October 10, 2025

Breaking the Reviewer: Assessing the Vulnerability of Large Language Models in Automated Peer Review Under Textual Adversarial Attacks

Tzu-Ling Lin, Wei-Chih Chen, Teng-Fang Hsiao, Hou-I Liu, Ya-Hsin Yeh, Yu Kai Chan, Wen-Sheng Lien, Po-Yen Kuo, Philip S. Yu, Hong-Han Shuai

PDF

Open Access 1 Video

TL;DR

This paper evaluates the vulnerability of large language models used as automated peer reviewers to textual adversarial attacks, highlighting significant robustness issues that threaten the reliability of AI-assisted peer review processes.

Contribution

It provides a comprehensive assessment of LLM robustness in peer review, revealing vulnerabilities and discussing mitigation strategies to improve reliability.

Findings

01

Text manipulations can significantly distort LLM assessments.

02

LLMs are less reliable under adversarial attacks compared to human reviewers.

03

Addressing adversarial risks is crucial for trustworthy AI in peer review.

Abstract

Peer review is essential for maintaining academic quality, but the increasing volume of submissions places a significant burden on reviewers. Large language models (LLMs) offer potential assistance in this process, yet their susceptibility to textual adversarial attacks raises reliability concerns. This paper investigates the robustness of LLMs used as automated reviewers in the presence of such attacks. We focus on three key questions: (1) The effectiveness of LLMs in generating reviews compared to human reviewers. (2) The impact of adversarial attacks on the reliability of LLM-generated reviews. (3) Challenges and potential mitigation strategies for LLM-based review. Our evaluation reveals significant vulnerabilities, as text manipulations can distort LLM assessments. We offer a comprehensive evaluation of LLM performance in automated peer reviewing and analyze its robustness against…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Breaking the Reviewer: Assessing the Vulnerability of Large Language Models in Automated Peer Review Under Textual Adversarial Attacks· underline

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning · Topic Modeling