Black-box Attacks on Image Activity Prediction and its Natural Language   Explanations

Alina Elena Baia; Valentina Poggioni; Andrea Cavallaro

arXiv:2310.00503·cs.CV·September 30, 2024

Black-box Attacks on Image Activity Prediction and its Natural Language Explanations

Alina Elena Baia, Valentina Poggioni, Andrea Cavallaro

PDF

Open Access

TL;DR

This paper investigates the vulnerability of multimodal, natural language explanation methods in image activity recognition models to black-box adversarial attacks, revealing they can be manipulated with minimal information.

Contribution

It is the first study to evaluate the robustness of multimodal XAI explanations against black-box attacks in image activity recognition models.

Findings

01

Adversarial perturbations can mislead explanation outputs without affecting predictions.

02

Black-box attacks require only access to model outputs, not internal details.

03

Explanations can be manipulated to be unfaithful to the model's true reasoning.

Abstract

Explainable AI (XAI) methods aim to describe the decision process of deep neural networks. Early XAI methods produced visual explanations, whereas more recent techniques generate multimodal explanations that include textual information and visual representations. Visual XAI methods have been shown to be vulnerable to white-box and gray-box adversarial attacks, with an attacker having full or partial knowledge of and access to the target system. As the vulnerabilities of multimodal XAI models have not been examined, in this paper we assess for the first time the robustness to black-box attacks of the natural language explanations generated by a self-rationalizing image-based activity recognition model. We generate unrestricted, spatially variant perturbations that disrupt the association between the predictions and the corresponding explanations to mislead the model into generating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education