Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability

Shova Kuikel; Aritran Piplai; Palvi Aggarwal

arXiv:2506.13746·cs.CR·June 17, 2025

Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability

Shova Kuikel, Aritran Piplai, Palvi Aggarwal

PDF

Open Access 1 Repo

TL;DR

This paper evaluates large language models for phishing detection, focusing on their accuracy, explainability, and internal consistency, and compares models like BERT, Llama, and Wizard using specialized metrics.

Contribution

It introduces a comprehensive evaluation of LLMs for phishing detection, emphasizing explainability and consistency, and applies novel metrics like CC SHAP to assess faithfulness.

Findings

01

Llama models show higher explanation consistency despite lower accuracy.

02

Wizard models achieve better classification accuracy but lower explanation alignment.

03

The study highlights trade-offs between accuracy and explainability in LLMs for cybersecurity.

Abstract

Phishing attacks remain one of the most prevalent and persistent cybersecurity threat with attackers continuously evolving and intensifying tactics to evade the general detection system. Despite significant advances in artificial intelligence and machine learning, faithfully reproducing the interpretable reasoning with classification and explainability that underpin phishing judgments remains challenging. Due to recent advancement in Natural Language Processing, Large Language Models (LLMs) show a promising direction and potential for improving domain specific phishing classification tasks. However, enhancing the reliability and robustness of classification models requires not only accurate predictions from LLMs but also consistent and trustworthy explanations aligning with those predictions. Therefore, a key question remains: can LLMs not only classify phishing emails accurately but…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PsyberSecLab/Fine-Tuning-and-Explainability-for-Phishing-Detection
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Topic Modeling · Sentiment Analysis and Opinion Mining

MethodsAttention Dropout · Dropout · Softmax · Dense Connections · Layer Normalization · Linear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Contrastive Learning · Wizard: Unsupervised goats tracking algorithm · BERT