Clouding the Mirror: Stealthy Prompt Injection Attacks Targeting LLM-based Phishing Detection

Takashi Koide; Hiroki Nakano; Daiki Chiba

arXiv:2602.05484·cs.CR·February 6, 2026

Clouding the Mirror: Stealthy Prompt Injection Attacks Targeting LLM-based Phishing Detection

Takashi Koide, Hiroki Nakano, Daiki Chiba

PDF

Open Access

TL;DR

This paper investigates how prompt injection attacks can stealthily manipulate LLM-based phishing detection systems, revealing vulnerabilities and proposing a defense framework to enhance their robustness.

Contribution

It introduces a comprehensive taxonomy for prompt injection attacks, empirically evaluates vulnerabilities in state-of-the-art models, and proposes InjectDefuser, a multi-faceted defense framework.

Findings

01

State-of-the-art models like GPT-5 are vulnerable to prompt injection.

02

The proposed InjectDefuser significantly reduces attack success rates.

03

A two-dimensional taxonomy effectively captures realistic prompt injection strategies.

Abstract

Phishing sites continue to grow in volume and sophistication. Recent work leverages large language models (LLMs) to analyze URLs, HTML, and rendered content to decide whether a website is a phishing site. While these approaches are promising, LLMs are inherently vulnerable to prompt injection (PI). Because attackers can fully control various elements of phishing sites, this creates the potential for PI that exploits the perceptual asymmetry between LLMs and humans: instructions imperceptible to end users can still be parsed by the LLM and can stealthily manipulate its judgment. The specific risks of PI in phishing detection and effective mitigation strategies remain largely unexplored. This paper presents the first comprehensive evaluation of PI against multimodal LLM-based phishing detection. We introduce a two-dimensional taxonomy, defined by Attack Techniques and Attack Surfaces,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Misinformation and Its Impacts · Advanced Malware Detection Techniques