Benchmarking Large Language Models for Zero-shot and Few-shot Phishing URL Detection

Najmul Hasan; Prashanth BusiReddyGari

arXiv:2602.02641·cs.CR·February 4, 2026

Benchmarking Large Language Models for Zero-shot and Few-shot Phishing URL Detection

Najmul Hasan, Prashanth BusiReddyGari

PDF

Open Access

TL;DR

This paper benchmarks large language models for zero-shot and few-shot phishing URL detection, highlighting their effectiveness and operational trade-offs in combating sophisticated AI-generated phishing attacks.

Contribution

It provides a comprehensive evaluation of LLMs under a unified framework for phishing URL detection, revealing the benefits of few-shot prompting and analyzing performance trade-offs.

Findings

01

Few-shot prompting improves LLM performance.

02

Performance varies across models and metrics.

03

Operational trade-offs are identified in detection settings.

Abstract

The Uniform Resource Locator (URL), introduced in a connectivity-first era to define access and locate resources, remains historically limited, lacking future-proof mechanisms for security, trust, or resilience against fraud and abuse, despite the introduction of reactive protections like HTTPS during the cybersecurity era. In the current AI-first threatscape, deceptive URLs have reached unprecedented sophistication due to the widespread use of generative AI by cybercriminals and the AI-vs-AI arms race to produce context-aware phishing websites and URLs that are virtually indistinguishable to both users and traditional detection tools. Although AI-generated phishing accounted for a small fraction of filter-bypassing attacks in 2024, phishing volume has escalated over 4,000% since 2022, with nearly 50% more attacks evading detection. At the rate the threatscape is escalating, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Misinformation and Its Impacts · Cybercrime and Law Enforcement Studies