Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability
Shova Kuikel, Aritran Piplai, Palvi Aggarwal

TL;DR
This paper evaluates large language models for phishing detection, focusing on their accuracy, explainability, and internal consistency, and compares models like BERT, Llama, and Wizard using specialized metrics.
Contribution
It introduces a comprehensive evaluation of LLMs for phishing detection, emphasizing explainability and consistency, and applies novel metrics like CC SHAP to assess faithfulness.
Findings
Llama models show higher explanation consistency despite lower accuracy.
Wizard models achieve better classification accuracy but lower explanation alignment.
The study highlights trade-offs between accuracy and explainability in LLMs for cybersecurity.
Abstract
Phishing attacks remain one of the most prevalent and persistent cybersecurity threat with attackers continuously evolving and intensifying tactics to evade the general detection system. Despite significant advances in artificial intelligence and machine learning, faithfully reproducing the interpretable reasoning with classification and explainability that underpin phishing judgments remains challenging. Due to recent advancement in Natural Language Processing, Large Language Models (LLMs) show a promising direction and potential for improving domain specific phishing classification tasks. However, enhancing the reliability and robustness of classification models requires not only accurate predictions from LLMs but also consistent and trustworthy explanations aligning with those predictions. Therefore, a key question remains: can LLMs not only classify phishing emails accurately but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Topic Modeling · Sentiment Analysis and Opinion Mining
MethodsAttention Dropout · Dropout · Softmax · Dense Connections · Layer Normalization · Linear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Contrastive Learning · Wizard: Unsupervised goats tracking algorithm · BERT
