Explainable Machine Learning for Phishing Detection on Heterogeneous Datasets with MCP-Enabled Deployment

Nikhil Kumar Dora; Sumit Kumar Tetarave; Rishikesh Sahay; Madhusudan Singh; Xiaoqing Li

arXiv:2605.17891·cs.CR·May 19, 2026

Explainable Machine Learning for Phishing Detection on Heterogeneous Datasets with MCP-Enabled Deployment

Nikhil Kumar Dora, Sumit Kumar Tetarave, Rishikesh Sahay, Madhusudan Singh, Xiaoqing Li

PDF

TL;DR

This paper evaluates various machine learning algorithms for phishing detection on diverse datasets, integrating explainable AI techniques and MCP-based deployment for real-time URL analysis and security interpretation.

Contribution

It introduces a comprehensive approach combining ML models, explainability methods, and MCP-enabled deployment for effective phishing detection and interpretability.

Findings

01

CatBoost achieved 95.01% accuracy among ensemble models.

02

DistilBERT achieved 99.78% accuracy among transformer models.

03

Logistic Regression achieved 92.44% accuracy among classical models.

Abstract

With the growth in digital transformation and Internet usage, the Social Engineering techniques such as Phishing have become a major concern for the users and the organizations. Phishing attacks involve deceptive techniques to trick users into revealing confidential information that causes financial loss and reputation damage to organizations. According to report of Verizon, 36% of all data breaches involved phishing, highlighting the need for intelligent, adaptive, and explainable security mechanisms. This paper examines the efficiency of different machine learning algorithms in phishing detection on heterogeneous phishing datasets that include a publicly available UCI dataset, our generated datasets using tools such as EvilGinx and Zphisher, and AI generated datasets. Moreover, this work incorporates explainable AI (XAI) techniques such as Information Gain, SHAP (SHapley Additive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.