Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions

Xuyang Guo; Zekai Huang; Zhao Song; Jiahao Zhang

arXiv:2508.13214·cs.CR·August 20, 2025

Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions

Xuyang Guo, Zekai Huang, Zhao Song, Jiahao Zhang

PDF

4 Reviews

TL;DR

This paper demonstrates that large language models are vulnerable to prompt injection attacks even on simple multiple-choice questions, raising concerns about their robustness in critical applications.

Contribution

The study introduces a straightforward prompt injection attack method on LLMs using basic arithmetic questions in PDFs, revealing significant robustness vulnerabilities.

Findings

01

LLMs can be misled by hidden prompts in simple questions

02

Prompt injection attacks are effective even in trivial scenarios

03

Robustness issues pose risks for LLMs in judgment tasks

Abstract

Large Language Models (LLMs) have recently demonstrated strong emergent abilities in complex reasoning and zero-shot generalization, showing unprecedented potential for LLM-as-a-judge applications in education, peer review, and data quality evaluation. However, their robustness under prompt injection attacks, where malicious instructions are embedded into the content to manipulate outputs, remains a significant concern. In this work, we explore a frustratingly simple yet effective attack setting to test whether LLMs can be easily misled. Specifically, we evaluate LLMs on basic arithmetic questions (e.g., "What is 3 + 2?") presented as either multiple-choice or true-false judgment problems within PDF files, where hidden prompts are injected into the file. Our results reveal that LLMs are indeed vulnerable to such hidden prompt injection attacks, even in these trivial scenarios,…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 4

Strengths

**1) Clear, reproducible setup:** The authors define a simple, interpretable pipeline (Eq. 1) for prompt injection in PDFs using LaTeX color control. **2) Novel variant of a known problem** Prior works (e.g., Liu et al., 2024c; Guo et al., 2024; Raina et al., 2024) study prompt injection generally, but few explore the PDF-hidden variant. The work highlights this under-studied vector. **3) Empirical value.** Confirms that even simple visual-level attacks can bypass superficial safety filters in

Weaknesses

**1) Trivial methodology, no insight beyond anecdote:** This paper's contribution is essentially a reproduction with simpler math tasks. The authors never quantify why certain models succumb, i.e., there is no causal insight, just observed failure. **2) Excessive space on prompt instantiation, minimal analysis:** Over two pages are devoted to LaTeX examples of "black", "white", and "no" prompt injections. This is implementation detail; the space could instead show token-level model traces or wh

Reviewer 02Rating 0Confidence 4

Strengths

The paper is overall well written and straightforward to read. The authors also aim to contribute to an important area, LLM security.

Weaknesses

- **This work seems to lack novelty.** The authors overlook a key related work “Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models” published in EMNLP Findings 2025 which investigates very similar “font injection” attacks. The setting they investigate is more general and their experimental investigation is more extensive. - **The experimental investigation is very limited.** It seems like the authors only do an experimental evaluation on

Reviewer 03Rating 2Confidence 5

Strengths

The paper discusses the important issue of fairness and robustness of LLMs when used as evaluators. The authors' initial exploration is promising. By successfully applying different prompt injection attacks on powerful LLMs, the paper effectively demonstrates the existence of these vulnerabilities.

Weaknesses

Despite the promising direction, the paper suffers from several major weaknesses in its current form: 1. The main conclusion of the paper appears to be trivial and somewhat obvious. The vulnerability of LLMs to injection attacks has already been well-established in a large body of prior work. This paper's analysis, while confirmatory, does little more than reiterate this known phenomenon. Consequently, the primary research question 1 posed by the authors is not a true research question, as its

Reviewer 04Rating 2Confidence 5

Strengths

1. The paper studies the reliability of LLM-as-a-judge applications under prompt injection, which is a timely area as AI integration becomes more common. 2. The paper is generally well presented and easy to understand.

Weaknesses

1. The core discovery is LLMs are "fooled" with injected prompts. However, essentially it's an instruction-following model following instructions (injected prompt) found in the input text. The finding is expected and not surprising. The experiments, for example black-text and white-text prompts in pdf, do not provide much insight. The scientific or technical contribution is quite limited. 2. Table 1 contains objectively wrong parameter counts for GPT-4o, o3, and DeepSeek. 3. The authors mentione

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.