Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing

Panagiotis Theocharopoulos; Ajinkya Kulkarni; Mathew Magimai.-Doss

arXiv:2512.23684·cs.CL·December 30, 2025

Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing

Panagiotis Theocharopoulos, Ajinkya Kulkarni, Mathew Magimai.-Doss

PDF

Open Access

TL;DR

This paper demonstrates that multilingual hidden prompt injection attacks can significantly influence LLM-based academic review outcomes, with effectiveness varying across languages, raising concerns about the security of AI-assisted peer review systems.

Contribution

The study introduces a dataset of real academic papers and systematically evaluates the impact of multilingual hidden prompt injections on LLM review decisions.

Findings

01

Prompt injections alter review scores and decisions in multiple languages.

02

Arabic injections have minimal impact compared to other languages.

03

Vulnerability to prompt injection varies significantly across languages.

Abstract

Large language models (LLMs) are increasingly considered for use in high-impact workflows, including academic peer review. However, LLMs are vulnerable to document-level hidden prompt injection attacks. In this work, we construct a dataset of approximately 500 real academic papers accepted to ICML and evaluate the effect of embedding hidden adversarial prompts within these documents. Each paper is injected with semantically equivalent instructions in four different languages and reviewed using an LLM. We find that prompt injection induces substantial changes in review scores and accept/reject decisions for English, Japanese, and Chinese injections, while Arabic injections produce little to no effect. These results highlight the susceptibility of LLM-based reviewing systems to document-level prompt injection and reveal notable differences in vulnerability across languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Academic integrity and plagiarism