Large Multimodal Agents for Accurate Phishing Detection with Enhanced Token Optimization and Cost Reduction
Fouad Trad, Ali Chehab

TL;DR
This paper demonstrates that large multimodal AI agents can effectively detect phishing websites by analyzing URLs and screenshots, and introduces a cost-efficient, two-tiered agentic approach that significantly reduces API expenses while maintaining high detection accuracy.
Contribution
The paper presents a novel two-tiered agentic method that reduces API costs in multimodal phishing detection without sacrificing performance.
Findings
Integrating URLs and screenshots improves detection accuracy.
The agentic approach reduces API costs by up to 4.2 times.
Cost-effective detection enables scalable phishing defense.
Abstract
With the rise of sophisticated phishing attacks, there is a growing need for effective and economical detection solutions. This paper explores the use of large multimodal agents, specifically Gemini 1.5 Flash and GPT-4o mini, to analyze both URLs and webpage screenshots via APIs, thus avoiding the complexities of training and maintaining AI systems. Our findings indicate that integrating these two data types substantially enhances detection performance over using either type alone. However, API usage incurs costs per query that depend on the number of input and output tokens. To address this, we propose a two-tiered agentic approach: initially, one agent assesses the URL, and if inconclusive, a second agent evaluates both the URL and the screenshot. This method not only maintains robust detection performance but also significantly reduces API costs by minimizing unnecessary multi-input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Text and Document Classification Technologies · Sentiment Analysis and Opinion Mining
MethodsUmbrella Reinforcement Learning
