Explainable phishing website detection for secure and sustainable cyber infrastructure

Tanzila Kehkashan; Maha Abdelhaq; Ahmad Sami Al-Shamayleh; Nazish Huda; Imran Ashraf Yaseen; Abdelmuttlib Ibrahim Abdalla Ahmed; Adnan Akhunzada

PMC · DOI:10.1038/s41598-025-27984-w·November 25, 2025

Explainable phishing website detection for secure and sustainable cyber infrastructure

Tanzila Kehkashan, Maha Abdelhaq, Ahmad Sami Al-Shamayleh, Nazish Huda, Imran Ashraf Yaseen, Abdelmuttlib Ibrahim Abdalla Ahmed, Adnan Akhunzada

PDF

Open Access

TL;DR

This paper proposes an explainable phishing detection system using machine learning and SHAP to improve accuracy and interpretability for secure cyber infrastructure.

Contribution

The novelty lies in using SHAP-based feature selection with URL-based models for interpretable and accurate phishing detection.

Findings

01

The random forest model achieved 97% accuracy in phishing detection.

02

SHAP improved model interpretability by highlighting important URL-based features.

03

The proposed system is efficient and suitable for resource-constrained devices.

Abstract

Phishing is a social engineering attack and a type of cybercrime that is dangerously and constantly on the rise. Phishing attacks can impact various sectors, including governmental, social, financial, and individual businesses. Traditional methods of identifying phishing websites, such as blacklist and heuristic approaches, often fail to provide sufficient protection. Moreover, traditional techniques that combine URLs, webpage content, and external features are time-consuming, require substantial computing power, and are unsuitable for devices with limited resources. Moreover, previous research has often overlooked the critical role of identifying which features are important for detection and their impact on outcomes. Traditional methods might not fully capture the significance of individual features. To overcome this issue, this research applies feature selection techniques,…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes1

SHROOM4

Proteins1

Species1

Homo sapiens(human · species)

Diseases4

plant DT XAI DL

Figures9

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Cybercrime and Law Enforcement Studies · Misinformation and Its Impacts