Can AI Models be Jailbroken to Phish Elderly Victims? An End-to-End Evaluation
Fred Heiding, Simon Lermen

TL;DR
This paper demonstrates how current AI models can be exploited to generate phishing content and successfully target elderly victims, exposing significant safety failures and vulnerabilities in AI safety measures.
Contribution
It provides the first comprehensive end-to-end evaluation of AI jailbreak attacks leading to real-world harm to vulnerable populations, especially the elderly.
Findings
Several models are highly susceptible to attack vectors.
AI-generated phishing emails compromised 11% of elderly participants.
Current safety guardrails are insufficient against sophisticated attacks.
Abstract
We present an end-to-end demonstration of how attackers can exploit AI safety failures to harm vulnerable populations: from jailbreaking LLMs to generate phishing content, to deploying those messages against real targets, to successfully compromising elderly victims. We systematically evaluated safety guardrails across six frontier LLMs spanning four attack categories, revealing critical failures where several models exhibited near-complete susceptibility to certain attack vectors. In a human validation study with 108 senior volunteers, AI-generated phishing emails successfully compromised 11\% of participants. Our work uniquely demonstrates the complete attack pipeline targeting elderly populations, highlighting that current AI safety measures fail to protect those most vulnerable to fraud. Beyond generating phishing content, LLMs enable attackers to overcome language barriers and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpam and Phishing Detection · AI in Service Interactions · Ethics and Social Impacts of AI
