Malicious URL Detection using Machine Learning: A Survey
Doyen Sahoo, Chenghao Liu, and Steven C.H. Hoi

TL;DR
This survey reviews machine learning techniques for malicious URL detection, highlighting their advantages over traditional blacklists, and discusses challenges and future research directions in cybersecurity applications.
Contribution
It provides a comprehensive classification and analysis of machine learning-based malicious URL detection methods, including feature representation and algorithm design.
Findings
Machine learning improves detection of new malicious URLs.
Traditional blacklists are insufficient for comprehensive detection.
The survey identifies open challenges and future research directions.
Abstract
Malicious URL, a.k.a. malicious website, is a common and serious threat to cybersecurity. Malicious URLs host unsolicited content (spam, phishing, drive-by exploits, etc.) and lure unsuspecting users to become victims of scams (monetary loss, theft of private information, and malware installation), and cause losses of billions of dollars every year. It is imperative to detect and act on such threats in a timely manner. Traditionally, this detection is done mostly through the usage of blacklists. However, blacklists cannot be exhaustive, and lack the ability to detect newly generated malicious URLs. To improve the generality of malicious URL detectors, machine learning techniques have been explored with increasing attention in recent years. This article aims to provide a comprehensive survey and a structural understanding of Malicious URL Detection techniques using machine learning. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Advanced Malware Detection Techniques · Network Security and Intrusion Detection
