DNS Typo-squatting Domain Detection: A Data Analytics & Machine Learning Based Approach
Abdallah Moubayed, MohammadNoor Injadat, Abdallah Shami, Hanan, Lutfiyya

TL;DR
This paper presents a machine learning approach combined with data analytics to detect typosquatting domains in DNS, improving detection accuracy and reducing false positives.
Contribution
It introduces an ensemble learning classifier using multiple algorithms and validates trends with clustering, enhancing typosquatting detection in DNS.
Findings
Ensemble classifier achieves high accuracy, precision, and F-score.
Legitimate domains tend to have shorter names and fewer unique characters.
Detection reduces suspicious domains by nearly five times while maintaining feature trend consistency.
Abstract
Domain Name System (DNS) is a crucial component of current IP-based networks as it is the standard mechanism for name to IP resolution. However, due to its lack of data integrity and origin authentication processes, it is vulnerable to a variety of attacks. One such attack is Typosquatting. Detecting this attack is particularly important as it can be a threat to corporate secrets and can be used to steal information or commit fraud. In this paper, a machine learning-based approach is proposed to tackle the typosquatting vulnerability. To that end, exploratory data analytics is first used to better understand the trends observed in eight domain name-based extracted features. Furthermore, a majority voting-based ensemble learning classifier built using five classification algorithms is proposed that can detect suspicious domains with high accuracy. Moreover, the observed trends are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodsk-Means Clustering
