Small Language Models for Phishing Website Detection: Cost, Performance, and Privacy Trade-Offs
Georg Goldenits, Philip Koenig, Sebastian Raubitzek, Andreas Ekelhart

TL;DR
This paper evaluates small language models for detecting phishing websites using raw HTML, analyzing their accuracy, resource needs, and cost-efficiency to explore a local, privacy-preserving alternative to large proprietary models.
Contribution
It systematically benchmarks 15 small language models for phishing detection, highlighting their trade-offs in performance and resource consumption compared to large models.
Findings
SLMs can detect phishing with reasonable accuracy
Smaller models require less computational resources
Trade-offs exist between model size, accuracy, and cost
Abstract
Phishing websites pose a major cybersecurity threat, exploiting unsuspecting users and causing significant financial and organisational harm. Traditional machine learning approaches for phishing detection often require extensive feature engineering, continuous retraining, and costly infrastructure maintenance. At the same time, proprietary large language models (LLMs) have demonstrated strong performance in phishing-related classification tasks, but their operational costs and reliance on external providers limit their practical adoption in many business environments. This paper investigates the feasibility of small language models (SLMs) for detecting phishing websites using only their raw HTML code. A key advantage of these models is that they can be deployed on local infrastructure, providing organisations with greater control over data and operations. We systematically evaluate 15…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Advanced Malware Detection Techniques · Misinformation and Its Impacts
