The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Creates Exploitable Vulnerabilities
Ron Litvak

TL;DR
This paper investigates how system prompt configurations influence the security of LLM email agents, revealing that prompt design significantly affects vulnerability to phishing attacks and that optimizing prompts can both improve detection and introduce brittleness.
Contribution
It introduces PhishNChips, a comprehensive study of prompt-model interactions affecting security, and proposes Safetility, a metric balancing detection performance with false positive costs.
Findings
Prompt configuration drastically affects phishing bypass rates.
Optimized prompts achieve high recall but increase brittleness.
Specific prompt strategies can be exploited by attackers using inverted signals.
Abstract
System prompt configuration can make the difference between near-total phishing blindness and near-perfect detection in LLM email agents. We present PhishNChips, a study of 11 models under 10 prompt strategies, showing that prompt-model interaction is a first-order security variable: a single model's phishing bypass rate ranges from under 1% to 97% depending on how it is configured, while the false-positive cost of the same prompt varies sharply across models. We then show that optimizing prompts around highly predictive signals can improve benchmark performance, reaching up to 93.7% recall at 3.8% false positive rate, but also creates a brittle attack surface. In particular, domain-matching strategies perform well when legitimate emails mostly have matched sender and URL domains, yet degrade sharply when attackers invert that signal by registering matching infrastructure.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Advanced Malware Detection Techniques · Network Security and Intrusion Detection
