Evaluating Large Language Models' Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects
Fred Heiding, Simon Lermen, Andrew Kao, Bruce Schneier, Arun, Vishwanath

TL;DR
This study demonstrates that large language models can autonomously conduct highly effective spear phishing campaigns, matching human performance and significantly surpassing traditional methods, with implications for cybersecurity and AI capabilities.
Contribution
The paper introduces a fully automated AI tool for spear phishing that outperforms previous models and compares its effectiveness with human experts and older AI systems.
Findings
AI-automated phishing achieved 54% click-through rate, matching human experts.
AI gathering information was 88% accurate, with only 4% inaccuracies.
AI enables targeting larger audiences at up to 50% lower cost.
Abstract
In this paper, we evaluate the capability of large language models to conduct personalized phishing attacks and compare their performance with human experts and AI models from last year. We include four email groups with a combined total of 101 participants: A control group of arbitrary phishing emails, which received a click-through rate (recipient pressed a link in the email) of 12%, emails generated by human experts (54% click-through), fully AI-automated emails 54% (click-through), and AI emails utilizing a human-in-the-loop (56% click-through). Thus, the AI-automated attacks performed on par with human experts and 350% better than the control group. The results are a significant improvement from similar studies conducted last year, highlighting the increased deceptive capabilities of AI models. Our AI-automated emails were sent using a custom-built tool that automates the entire…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Misinformation and Its Impacts · Topic Modeling
