When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection

Devanshu Sahoo; Manish Prasad; Vasudev Majhi; Jahnvi Singh; Vinay Chamola; Yash Sinha; Murari Mandal; Dhruv Kumar

arXiv:2512.10449·cs.AI·January 7, 2026

When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection

Devanshu Sahoo, Manish Prasad, Vasudev Majhi, Jahnvi Singh, Vinay Chamola, Yash Sinha, Murari Mandal, Dhruv Kumar

PDF

Open Access

TL;DR

This paper assesses the vulnerability of LLM-based scientific review systems to adversarial PDF manipulations, introducing WAVS as a metric, and demonstrating high success rates of attack strategies in flipping review decisions.

Contribution

It introduces WAVS for quantifying adversarial vulnerability and evaluates multiple attack strategies across diverse LLMs, revealing significant susceptibility in current review systems.

Findings

01

Obfuscation techniques achieve up to 86.26% decision flip rate.

02

Certain proprietary systems exhibit unique reasoning traps.

03

The study provides datasets and frameworks for future research.

Abstract

Driven by surging submission volumes, scientific peer review has catalyzed two parallel trends: individual over-reliance on LLMs and institutional AI-powered assessment systems. This study investigates the robustness of "LLM-as-a-Judge" systems to adversarial PDF manipulation via invisible text injections and layout aware encoding attacks. We specifically target the distinct incentive of flipping "Reject" decisions to "Accept," a vulnerability that fundamentally compromises scientific integrity. To measure this, we introduce the Weighted Adversarial Vulnerability Score (WAVS), a novel metric that quantifies susceptibility by weighting score inflation against the severity of decision shifts relative to ground truth. We adapt 15 domain-specific attack strategies, ranging from semantic persuasion to cognitive obfuscation, and evaluate them across 13 diverse language models (including GPT-5…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAcademic integrity and plagiarism · Adversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection