Adversarial Attacks on VQA-NLE: Exposing and Alleviating Inconsistencies in Visual Question Answering Explanations
Yahsin Yeh, Yilun Wu, Bokai Ruan, Honghan Shuai

TL;DR
This paper exposes vulnerabilities in VQA-NLE systems by demonstrating their susceptibility to adversarial attacks that cause inconsistent explanations and proposes a knowledge-based mitigation to improve robustness.
Contribution
It introduces novel adversarial strategies for perturbing questions and images in VQA-NLE, along with a knowledge-based defense mechanism to enhance explanation consistency.
Findings
Adversarial attacks can induce contradictory explanations in VQA-NLE models.
Knowledge-based mitigation improves explanation consistency and model robustness.
VQA-NLE systems are vulnerable to security and reliability issues.
Abstract
Natural language explanations in visual question answering (VQA-NLE) aim to make black-box models more transparent by elucidating their decision-making processes. However, we find that existing VQA-NLE systems can produce inconsistent explanations and reach conclusions without genuinely understanding the underlying context, exposing weaknesses in either their inference pipeline or explanation-generation mechanism. To highlight these vulnerabilities, we not only leverage an existing adversarial strategy to perturb questions but also propose a novel strategy that minimally alters images to induce contradictory or spurious outputs. We further introduce a mitigation method that leverages external knowledge to alleviate these inconsistencies, thereby bolstering model robustness. Extensive evaluations on two standard benchmarks and two widely used VQA-NLE models underscore the effectiveness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning
