Clouding the Mirror: Stealthy Prompt Injection Attacks Targeting LLM-based Phishing Detection
Takashi Koide, Hiroki Nakano, Daiki Chiba

TL;DR
This paper investigates how prompt injection attacks can stealthily manipulate LLM-based phishing detection systems, revealing vulnerabilities and proposing a defense framework to enhance their robustness.
Contribution
It introduces a comprehensive taxonomy for prompt injection attacks, empirically evaluates vulnerabilities in state-of-the-art models, and proposes InjectDefuser, a multi-faceted defense framework.
Findings
State-of-the-art models like GPT-5 are vulnerable to prompt injection.
The proposed InjectDefuser significantly reduces attack success rates.
A two-dimensional taxonomy effectively captures realistic prompt injection strategies.
Abstract
Phishing sites continue to grow in volume and sophistication. Recent work leverages large language models (LLMs) to analyze URLs, HTML, and rendered content to decide whether a website is a phishing site. While these approaches are promising, LLMs are inherently vulnerable to prompt injection (PI). Because attackers can fully control various elements of phishing sites, this creates the potential for PI that exploits the perceptual asymmetry between LLMs and humans: instructions imperceptible to end users can still be parsed by the LLM and can stealthily manipulate its judgment. The specific risks of PI in phishing detection and effective mitigation strategies remain largely unexplored. This paper presents the first comprehensive evaluation of PI against multimodal LLM-based phishing detection. We introduce a two-dimensional taxonomy, defined by Attack Techniques and Attack Surfaces,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Misinformation and Its Impacts · Advanced Malware Detection Techniques
