Adversarial Attacks on VQA-NLE: Exposing and Alleviating Inconsistencies in Visual Question Answering Explanations

Yahsin Yeh; Yilun Wu; Bokai Ruan; Honghan Shuai

arXiv:2508.12430·cs.CV·August 19, 2025

Adversarial Attacks on VQA-NLE: Exposing and Alleviating Inconsistencies in Visual Question Answering Explanations

Yahsin Yeh, Yilun Wu, Bokai Ruan, Honghan Shuai

PDF

Open Access

TL;DR

This paper exposes vulnerabilities in VQA-NLE systems by demonstrating their susceptibility to adversarial attacks that cause inconsistent explanations and proposes a knowledge-based mitigation to improve robustness.

Contribution

It introduces novel adversarial strategies for perturbing questions and images in VQA-NLE, along with a knowledge-based defense mechanism to enhance explanation consistency.

Findings

01

Adversarial attacks can induce contradictory explanations in VQA-NLE models.

02

Knowledge-based mitigation improves explanation consistency and model robustness.

03

VQA-NLE systems are vulnerable to security and reliability issues.

Abstract

Natural language explanations in visual question answering (VQA-NLE) aim to make black-box models more transparent by elucidating their decision-making processes. However, we find that existing VQA-NLE systems can produce inconsistent explanations and reach conclusions without genuinely understanding the underlying context, exposing weaknesses in either their inference pipeline or explanation-generation mechanism. To highlight these vulnerabilities, we not only leverage an existing adversarial strategy to perturb questions but also propose a novel strategy that minimally alters images to induce contradictory or spurious outputs. We further introduce a mitigation method that leverages external knowledge to alleviate these inconsistencies, thereby bolstering model robustness. Extensive evaluations on two standard benchmarks and two widely used VQA-NLE models underscore the effectiveness…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning