AI Powered Image Analysis for Phishing Detection
K. Acharya, S. Ale, R. Kadel

TL;DR
This study evaluates convolutional and transformer vision models for detecting visually deceptive phishing websites using webpage screenshots, emphasizing threshold-aware evaluation and model efficiency.
Contribution
It introduces a comprehensive framework for image-based phishing detection, compares ConvNeXt-Tiny and ViT-Base, and highlights the importance of threshold tuning for real-world applications.
Findings
ConvNeXt-Tiny achieves the highest F1-score among tested models.
Threshold tuning significantly impacts detection performance.
ConvNeXt-Tiny is more computationally efficient than ViT-Base.
Abstract
Phishing websites now rely heavily on visual imitation-copied logos, similar layouts, and matching colours-to avoid detection by text- and URL-based systems. This paper presents a deep learning approach that uses webpage screenshots for image-based phishing detection. Two vision models, ConvNeXt-Tiny and Vision Transformer (ViT-Base), were tested to see how well they handle visually deceptive phishing pages. The framework covers dataset creation, preprocessing, transfer learning with ImageNet weights, and evaluation using different decision thresholds. The results show that ConvNeXt-Tiny performs the best overall, achieving the highest F1-score at the optimised threshold and running more efficiently than ViT-Base. This highlights the strength of convolutional models for visual phishing detection and shows why threshold tuning is important for real-world deployment. As future work, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
