Empirical Analysis of Adversarial Robustness and Explainability Drift in Cybersecurity Classifiers
Mona Rajhans, Vishal Khawarey

TL;DR
This study empirically evaluates how adversarial attacks affect the robustness and explainability of cybersecurity classifiers, proposing a new metric and analyzing feature sensitivity to improve trustworthy AI security systems.
Contribution
It introduces the Robustness Index (RI) for quantifying adversarial robustness and provides empirical insights into the relationship between robustness and interpretability in cybersecurity models.
Findings
Adversarial training increases robustness by up to 9% RI improvement.
Robustness and interpretability degrade together under adversarial perturbations.
Gradient and SHAP analyses identify vulnerable input features.
Abstract
Machine learning (ML) models are increasingly deployed in cybersecurity applications such as phishing detection and network intrusion prevention. However, these models remain vulnerable to adversarial perturbations small, deliberate input modifications that can degrade detection accuracy and compromise interpretability. This paper presents an empirical study of adversarial robustness and explainability drift across two cybersecurity domains phishing URL classification and network intrusion detection. We evaluate the impact of L (infinity) bounded Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) perturbations on model accuracy and introduce a quantitative metric, the Robustness Index (RI), defined as the area under the accuracy perturbation curve. Gradient based feature sensitivity and SHAP based attribution drift analyses reveal which input features are most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Spam and Phishing Detection
