When and How to Fool Explainable Models (and Humans) with Adversarial   Examples

Jon Vadillo; Roberto Santana; Jose A. Lozano

arXiv:2107.01943·cs.LG·February 18, 2025

When and How to Fool Explainable Models (and Humans) with Adversarial Examples

Jon Vadillo, Roberto Santana, Jose A. Lozano

PDF

Open Access 1 Repo

TL;DR

This paper explores the potential and limitations of adversarial attacks on explainable machine learning models, emphasizing human assessment and proposing a comprehensive framework for generating such attacks.

Contribution

It introduces a novel framework for studying adversarial examples in explainable models, considering human factors and diverse attack scenarios.

Findings

01

Extended adversarial example concept for explainable models.

02

Proposed a comprehensive attack framework considering human assessment.

03

Illustrated novel attack paradigms for deceiving explainable models.

Abstract

Reliable deployment of machine learning models such as neural networks continues to be challenging due to several limitations. Some of the main shortcomings are the lack of interpretability and the lack of robustness against adversarial examples or out-of-distribution inputs. In this exploratory review, we explore the possibilities and limits of adversarial attacks for explainable machine learning models. First, we extend the notion of adversarial examples to fit in explainable machine learning scenarios, in which the inputs, the output classifications and the explanations of the model's decisions are assessed by humans. Next, we propose a comprehensive framework to study whether (and how) adversarial examples can be generated for explainable models under human assessment, introducing and illustrating novel attack paradigms. In particular, our framework considers a wide range of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vadel/ae4xai
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications