Empirical Analysis of Adversarial Robustness and Explainability Drift in Cybersecurity Classifiers

Mona Rajhans; Vishal Khawarey

arXiv:2602.06395·cs.CR·February 9, 2026

Empirical Analysis of Adversarial Robustness and Explainability Drift in Cybersecurity Classifiers

Mona Rajhans, Vishal Khawarey

PDF

Open Access

TL;DR

This study empirically evaluates how adversarial attacks affect the robustness and explainability of cybersecurity classifiers, proposing a new metric and analyzing feature sensitivity to improve trustworthy AI security systems.

Contribution

It introduces the Robustness Index (RI) for quantifying adversarial robustness and provides empirical insights into the relationship between robustness and interpretability in cybersecurity models.

Findings

01

Adversarial training increases robustness by up to 9% RI improvement.

02

Robustness and interpretability degrade together under adversarial perturbations.

03

Gradient and SHAP analyses identify vulnerable input features.

Abstract

Machine learning (ML) models are increasingly deployed in cybersecurity applications such as phishing detection and network intrusion prevention. However, these models remain vulnerable to adversarial perturbations small, deliberate input modifications that can degrade detection accuracy and compromise interpretability. This paper presents an empirical study of adversarial robustness and explainability drift across two cybersecurity domains phishing URL classification and network intrusion detection. We evaluate the impact of L (infinity) bounded Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) perturbations on model accuracy and introduce a quantitative metric, the Robustness Index (RI), defined as the area under the accuracy perturbation curve. Gradient based feature sensitivity and SHAP based attribution drift analyses reveal which input features are most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Spam and Phishing Detection