A New Dataset and Methodology for Malicious URL Classification
Ilan Schvartzman, Roei Sarussi, Maor Ashkenazi, Ido kringel, Yaniv, Tocker, Tal Furman Shohet

TL;DR
This paper introduces DeepURLBench, a comprehensive multi-class dataset for malicious URL classification, and enhances URLNet with DNS features to improve accuracy and real-time performance in cybersecurity.
Contribution
The paper presents a new multi-class dataset for malicious URLs and improves URLNet with DNS features, advancing real-time classification capabilities.
Findings
DeepURLBench outperforms existing datasets in quality and structure.
Enhanced URLNet with DNS features shows significant accuracy improvements.
Model maintains real-time efficiency with the proposed enhancements.
Abstract
Malicious URL (Uniform Resource Locator) classification is a pivotal aspect of Cybersecurity, offering defense against web-based threats. Despite deep learning's promise in this area, its advancement is hindered by two main challenges: the scarcity of comprehensive, open-source datasets and the limitations of existing models, which either lack real-time capabilities or exhibit suboptimal performance. In order to address these gaps, we introduce a novel, multi-class dataset for malicious URL classification, distinguishing between benign, phishing and malicious URLs, named DeepURLBench. The data has been rigorously cleansed and structured, providing a superior alternative to existing datasets. Notably, the multi-class approach enhances the performance of deep learning models, as compared to a standard binary classification approach. Additionally, we propose improvements to string-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection
MethodsUmbrella Reinforcement Learning
