Multilingual Email Phishing Attacks Detection using OSINT and Machine Learning
Panharith An, Rana Shafi, Tionge Mughogho, Onyango Allan Onyango

TL;DR
This paper presents a method combining OSINT tools and machine learning to detect multilingual email phishing attacks, achieving high accuracy and addressing language diversity in cybersecurity detection.
Contribution
It introduces an integrated approach using OSINT features with ML models for multilingual phishing detection, improving accuracy over traditional methods.
Findings
Random Forest achieved 97.37% accuracy on English and Arabic datasets.
OSINT features improved detection accuracy compared to baseline models.
Multilingual datasets enhance the robustness of phishing detection models.
Abstract
Email phishing remains a prevalent cyber threat, targeting victims to extract sensitive information or deploy malicious software. This paper explores the integration of open-source intelligence (OSINT) tools and machine learning (ML) models to enhance phishing detection across multilingual datasets. Using Nmap and theHarvester, this study extracted 17 features, including domain names, IP addresses, and open ports, to improve detection accuracy. Multilingual email datasets, including English and Arabic, were analyzed to address the limitations of ML models trained predominantly on English data. Experiments with five classification algorithms: Decision Tree, Random Forest, Support Vector Machine, XGBoost, and Multinomial Na\"ive Bayes. It revealed that Random Forest achieved the highest performance, with an accuracy of 97.37% for both English and Arabic datasets. For OSINT-enhanced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection
