Phishing Website Detection through Multi-Model Analysis of HTML Content
Furkan \c{C}olhak, Mert \.Ilhan Ecevit, Bilal Emir U\c{c}ar, Reiner, Creutzburg, Hasan Da\u{g}

TL;DR
This paper presents a multi-model approach combining NLP and MLP techniques for detecting phishing websites based on HTML content, achieving high accuracy and creating a new, relevant dataset.
Contribution
It introduces a novel multi-model fusion method for phishing detection and provides an up-to-date, real-world dataset for the research community.
Findings
The proposed MultiText-LP model achieves 96.80 F1 score.
Fusion of NLP and MLP models outperforms existing methods.
CANINE and RoBERTa models excel in analyzing titles and content.
Abstract
The way we communicate and work has changed significantly with the rise of the Internet. While it has opened up new opportunities, it has also brought about an increase in cyber threats. One common and serious threat is phishing, where cybercriminals employ deceptive methods to steal sensitive information.This study addresses the pressing issue of phishing by introducing an advanced detection model that meticulously focuses on HTML content. Our proposed approach integrates a specialized Multi-Layer Perceptron (MLP) model for structured tabular data and two pretrained Natural Language Processing (NLP) models for analyzing textual features such as page titles and content. The embeddings from these models are harmoniously combined through a novel fusion process. The resulting fused embeddings are then input into a linear classifier. Recognizing the scarcity of recent datasets for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Misinformation and Its Impacts · Advanced Malware Detection Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Attention Dropout · WordPiece · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · BERT · Linear Layer · Softmax
