Explainable Machine Learning for Phishing Detection on Heterogeneous Datasets with MCP-Enabled Deployment
Nikhil Kumar Dora, Sumit Kumar Tetarave, Rishikesh Sahay, Madhusudan Singh, Xiaoqing Li

TL;DR
This paper evaluates various machine learning algorithms for phishing detection on diverse datasets, integrating explainable AI techniques and MCP-based deployment for real-time URL analysis and security interpretation.
Contribution
It introduces a comprehensive approach combining ML models, explainability methods, and MCP-enabled deployment for effective phishing detection and interpretability.
Findings
CatBoost achieved 95.01% accuracy among ensemble models.
DistilBERT achieved 99.78% accuracy among transformer models.
Logistic Regression achieved 92.44% accuracy among classical models.
Abstract
With the growth in digital transformation and Internet usage, the Social Engineering techniques such as Phishing have become a major concern for the users and the organizations. Phishing attacks involve deceptive techniques to trick users into revealing confidential information that causes financial loss and reputation damage to organizations. According to report of Verizon, 36% of all data breaches involved phishing, highlighting the need for intelligent, adaptive, and explainable security mechanisms. This paper examines the efficiency of different machine learning algorithms in phishing detection on heterogeneous phishing datasets that include a publicly available UCI dataset, our generated datasets using tools such as EvilGinx and Zphisher, and AI generated datasets. Moreover, this work incorporates explainable AI (XAI) techniques such as Information Gain, SHAP (SHapley Additive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
