PhishDef: URL Names Say It All
Anh Le, Athina Markopoulou, Michalis Faloutsos

TL;DR
PhishDef is a lightweight, online phishing URL detection system that relies solely on lexical URL features, demonstrating high accuracy, robustness to noisy data, and suitability for real-time deployment.
Contribution
The paper introduces PhishDef, a novel phishing detection system that uses only lexical URL features, combining multiple classification algorithms and an online method for robustness.
Findings
Lexical features are sufficient for effective phishing detection.
AROW algorithm improves resilience to noisy training data.
PhishDef outperforms state-of-the-art approaches on real datasets.
Abstract
Phishing is an increasingly sophisticated method to steal personal user information using sites that pretend to be legitimate. In this paper, we take the following steps to identify phishing URLs. First, we carefully select lexical features of the URLs that are resistant to obfuscation techniques used by attackers. Second, we evaluate the classification accuracy when using only lexical features, both automatically and hand-selected, vs. when using additional features. We show that lexical features are sufficient for all practical purposes. Third, we thoroughly compare several classification algorithms, and we propose to use an online method (AROW) that is able to overcome noisy training data. Based on the insights gained from our analysis, we propose PhishDef, a phishing detection system that uses only URL names and combines the above three elements. PhishDef is a highly accurate method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
