XSub: Explanation-Driven Adversarial Attack against Blackbox Classifiers   via Feature Substitution

Kiana Vu; Phung Lai; Truc Nguyen

arXiv:2409.08919·cs.LG·September 16, 2024

XSub: Explanation-Driven Adversarial Attack against Blackbox Classifiers via Feature Substitution

Kiana Vu, Phung Lai, Truc Nguyen

PDF

Open Access

TL;DR

This paper introduces XSub, an explanation-driven adversarial attack method that strategically substitutes important features identified via XAI to mislead black-box classifiers, balancing attack effectiveness and stealthiness with minimal queries.

Contribution

XSub is a novel, cost-effective adversarial attack leveraging feature substitution guided by explanations, capable of attacking black-box models and extending to backdoor attacks.

Findings

01

XSub achieves high attack success with minimal queries.

02

The method balances stealthiness and effectiveness through adjustable feature substitution.

03

XSub is applicable across various AI models and can facilitate backdoor attacks.

Abstract

Despite its significant benefits in enhancing the transparency and trustworthiness of artificial intelligence (AI) systems, explainable AI (XAI) has yet to reach its full potential in real-world applications. One key challenge is that XAI can unintentionally provide adversaries with insights into black-box models, inevitably increasing their vulnerability to various attacks. In this paper, we develop a novel explanation-driven adversarial attack against black-box classifiers based on feature substitution, called XSub. The key idea of XSub is to strategically replace important features (identified via XAI) in the original sample with corresponding important features from a "golden sample" of a different label, thereby increasing the likelihood of the model misclassifying the perturbed sample. The degree of feature substitution is adjustable, allowing us to control how much of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Explainable Artificial Intelligence (XAI)