Malicious URL Detection using optimized Hist Gradient Boosting Classifier based on grid search method
Mohammad Maftoun, Nima Shadkam, Seyedeh Somayeh Salehi Komamardakhi,, Zulkefli Mansor, Javad Hassannataj Joloudari

TL;DR
This paper proposes an optimized Hist Gradient Boosting Classifier using grid search for detecting malicious URLs, demonstrating superior performance over other machine learning models on a balanced dataset.
Contribution
It introduces an optimized HGBC model with grid search for malicious URL detection, including data balancing and feature normalization, showing improved accuracy and metrics.
Findings
HGBC achieved the highest accuracy among tested models
Data balancing with SMOTE improved classifier performance
Optimized hyperparameters enhanced model effectiveness
Abstract
Trusting the accuracy of data inputted on online platforms can be difficult due to the possibility of malicious websites gathering information for unlawful reasons. Analyzing each website individually becomes challenging with the presence of such malicious sites, making it hard to efficiently list all Uniform Resource Locators (URLs) on a blacklist. This ongoing challenge emphasizes the crucial need for strong security measures to safeguard against potential threats and unauthorized data collection. To detect the risk posed by malicious websites, it is proposed to utilize Machine Learning (ML)-based techniques. To this, we used several ML techniques such as Hist Gradient Boosting Classifier (HGBC), K-Nearest Neighbor (KNN), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Multi-Layer Perceptron (MLP), Light Gradient Boosting Machine (LGBM), and Support Vector Machine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Advanced Malware Detection Techniques
MethodsSparse Evolutionary Training · Logistic Regression
