XSub: Explanation-Driven Adversarial Attack against Blackbox Classifiers via Feature Substitution
Kiana Vu, Phung Lai, Truc Nguyen

TL;DR
This paper introduces XSub, an explanation-driven adversarial attack method that strategically substitutes important features identified via XAI to mislead black-box classifiers, balancing attack effectiveness and stealthiness with minimal queries.
Contribution
XSub is a novel, cost-effective adversarial attack leveraging feature substitution guided by explanations, capable of attacking black-box models and extending to backdoor attacks.
Findings
XSub achieves high attack success with minimal queries.
The method balances stealthiness and effectiveness through adjustable feature substitution.
XSub is applicable across various AI models and can facilitate backdoor attacks.
Abstract
Despite its significant benefits in enhancing the transparency and trustworthiness of artificial intelligence (AI) systems, explainable AI (XAI) has yet to reach its full potential in real-world applications. One key challenge is that XAI can unintentionally provide adversaries with insights into black-box models, inevitably increasing their vulnerability to various attacks. In this paper, we develop a novel explanation-driven adversarial attack against black-box classifiers based on feature substitution, called XSub. The key idea of XSub is to strategically replace important features (identified via XAI) in the original sample with corresponding important features from a "golden sample" of a different label, thereby increasing the likelihood of the model misclassifying the perturbed sample. The degree of feature substitution is adjustable, allowing us to control how much of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Explainable Artificial Intelligence (XAI)
