Small Language Models for Phishing Website Detection: Cost, Performance, and Privacy Trade-Offs

Georg Goldenits; Philip Koenig; Sebastian Raubitzek; Andreas Ekelhart

arXiv:2511.15434·cs.CR·November 20, 2025

Small Language Models for Phishing Website Detection: Cost, Performance, and Privacy Trade-Offs

Georg Goldenits, Philip Koenig, Sebastian Raubitzek, Andreas Ekelhart

PDF

Open Access

TL;DR

This paper evaluates small language models for detecting phishing websites using raw HTML, analyzing their accuracy, resource needs, and cost-efficiency to explore a local, privacy-preserving alternative to large proprietary models.

Contribution

It systematically benchmarks 15 small language models for phishing detection, highlighting their trade-offs in performance and resource consumption compared to large models.

Findings

01

SLMs can detect phishing with reasonable accuracy

02

Smaller models require less computational resources

03

Trade-offs exist between model size, accuracy, and cost

Abstract

Phishing websites pose a major cybersecurity threat, exploiting unsuspecting users and causing significant financial and organisational harm. Traditional machine learning approaches for phishing detection often require extensive feature engineering, continuous retraining, and costly infrastructure maintenance. At the same time, proprietary large language models (LLMs) have demonstrated strong performance in phishing-related classification tasks, but their operational costs and reliance on external providers limit their practical adoption in many business environments. This paper investigates the feasibility of small language models (SLMs) for detecting phishing websites using only their raw HTML code. A key advantage of these models is that they can be deployed on local infrastructure, providing organisations with greater control over data and operations. We systematically evaluate 15…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Advanced Malware Detection Techniques · Misinformation and Its Impacts