Explainable Transformer-Based Email Phishing Classification with Adversarial Robustness
Sajad U P

TL;DR
This paper introduces a hybrid transformer-based framework for email phishing detection that combines adversarial training and explainability techniques to improve robustness and transparency against AI-generated threats.
Contribution
It proposes a novel approach integrating DistilBERT with adversarial training and LIME XAI, along with Flan-T5 for user-friendly explanations, enhancing phishing detection robustness and interpretability.
Findings
Improved resilience of phishing detection against adversarial attacks.
Enhanced transparency through LIME explanations.
Effective generation of plain-language security narratives.
Abstract
Phishing and related cyber threats are becoming more varied and technologically advanced. Among these, email-based phishing remains the most dominant and persistent threat. These attacks exploit human vulnerabilities to disseminate malware or gain unauthorized access to sensitive information. Deep learning (DL) models, particularly transformer-based models, have significantly enhanced phishing mitigation through their contextual understanding of language. However, some recent threats, specifically Artificial Intelligence (AI)-generated phishing attacks, are reducing the overall system resilience of phishing detectors. In response, adversarial training has shown promise against AI-generated phishing threats. This study presents a hybrid approach that uses DistilBERT, a smaller, faster, and lighter version of the BERT transformer model for email classification. Robustness against…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Misinformation and Its Impacts · Adversarial Robustness in Machine Learning
