Malicious Web Domain Identification using Online Credibility and Performance Data by Considering the Class Imbalance Issue
Zhongyi Hu, Raymond Chiong, Ilung Pranata, Yukun Bao, Yuqing Lin

TL;DR
This paper presents a machine learning-based method for identifying malicious web domains using online credibility and performance data, effectively addressing class imbalance with a novel integrated resampling approach combining SMOTE and PSO.
Contribution
It introduces an integrated resampling technique using SMOTE and PSO to improve malicious web domain detection in imbalanced datasets.
Findings
The proposed approach outperforms five other resampling methods.
Effective in handling different imbalance ratios in real-world datasets.
Enhances malicious web domain detection accuracy using online credibility data.
Abstract
Purpose: Malicious web domain identification is of significant importance to the security protection of Internet users. With online credibility and performance data, this paper aims to investigate the use of machine learning tech-niques for malicious web domain identification by considering the class imbalance issue (i.e., there are more benign web domains than malicious ones). Design/methodology/approach: We propose an integrated resampling approach to handle class imbalance by combining the Synthetic Minority Over-sampling TEchnique (SMOTE) and Particle Swarm Optimisation (PSO), a population-based meta-heuristic algorithm. We use the SMOTE for over-sampling and PSO for under-sampling. Findings: By applying eight well-known machine learning classifiers, the proposed integrated resampling approach is comprehensively examined using several imbalanced web domain datasets with different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSynthetic Minority Over-sampling Technique.
