PhishKey: A Novel Centroid-Based Approach for Enhanced Phishing Detection Using Adaptive HTML Component Extraction
Felipe Casta\~no, Eduardo Fidalgo, Enrique Alegre, Rocio Alaiz-Rodr\'iguez, Raul Orduna, Francesco Zola

TL;DR
PhishKey is an innovative phishing detection system that combines character-level URL analysis with HTML content extraction, utilizing CNNs and centroid-based methods to improve accuracy, robustness, and resistance to adversarial attacks.
Contribution
The paper introduces PhishKey, a hybrid approach integrating CNN-based URL classification with centroid-based HTML component extraction for enhanced phishing detection.
Findings
Achieves up to 98.70% F1 Score on multiple datasets
Demonstrates strong resistance to adversarial injection attacks
Provides a robust, efficient detection method combining multiple features
Abstract
Phishing attacks pose a significant cybersecurity threat, evolving rapidly to bypass detection mechanisms and exploit human vulnerabilities. This paper introduces PhishKey to address the challenges of adaptability, robustness, and efficiency. PhishKey is a novel phishing detection method using automatic feature extraction from hybrid sources. PhishKey combines character-level processing with Convolutional Neural Networks (CNN) for URL classification, and a Centroid-Based Key Component Phishing Extractor (CAPE) for HTML content at the word level. CAPE reduces noise and ensures complete sample processing avoiding crop operations on the input data. The predictions from both modules are integrated using a soft-voting ensemble to achieve more accurate and reliable classifications. Experimental evaluations on four state-of-the-art datasets demonstrate the effectiveness of PhishKey. It…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Misinformation and Its Impacts · Web Data Mining and Analysis
MethodsUmbrella Reinforcement Learning
