PhishIntentionLLM: Uncovering Phishing Website Intentions through Multi-Agent Retrieval-Augmented Generation

Wenhao Li; Selvakumar Manickam; Yung-wey Chong; Shankar Karuppayah

arXiv:2507.15419·cs.CR·July 22, 2025

PhishIntentionLLM: Uncovering Phishing Website Intentions through Multi-Agent Retrieval-Augmented Generation

Wenhao Li, Selvakumar Manickam, Yung-wey Chong, Shankar Karuppayah

PDF

TL;DR

This paper introduces PhishIntentionLLM, a novel multi-agent retrieval-augmented framework that uses visual-language models to identify and analyze phishing website intentions from screenshots, significantly improving accuracy over previous methods.

Contribution

It presents the first phishing intention dataset, a multi-agent RAG framework leveraging LLMs for intention recognition, and demonstrates substantial performance improvements in phishing analysis.

Findings

01

Achieves 0.7895 micro-precision with GPT-4o.

02

Outperforms single-agent baseline by ~95%.

03

Improves credential theft precision to 0.8545.

Abstract

Phishing websites remain a major cybersecurity threat, yet existing methods primarily focus on detection, while the recognition of underlying malicious intentions remains largely unexplored. To address this gap, we propose PhishIntentionLLM, a multi-agent retrieval-augmented generation (RAG) framework that uncovers phishing intentions from website screenshots. Leveraging the visual-language capabilities of large language models (LLMs), our framework identifies four key phishing objectives: Credential Theft, Financial Fraud, Malware Distribution, and Personal Information Harvesting. We construct and release the first phishing intention ground truth dataset (~2K samples) and evaluate the framework using four commercial LLMs. Experimental results show that PhishIntentionLLM achieves a micro-precision of 0.7895 with GPT-4o and significantly outperforms the single-agent baseline with a ~95%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.