Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused   Interventions

Daniel Rosenberg; Itai Gat; Amir Feder; Roi Reichart

arXiv:2106.04484·cs.CV·September 20, 2021·1 cites

Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions

Daniel Rosenberg, Itai Gat, Amir Feder, Roi Reichart

PDF

Open Access

TL;DR

This paper introduces RAD, a new robustness measure for VQA systems that assesses their consistency when faced with counterfactually augmented data, revealing brittleness and linking robustness to generalization.

Contribution

The paper proposes a novel robustness measure, RAD, for evaluating VQA models' stability against focused counterfactual augmentations, highlighting current models' vulnerabilities.

Findings

01

Current VQA systems are often brittle to counterfactual question modifications.

02

RAD effectively quantifies robustness and exposes failure cases in state-of-the-art models.

03

Robustness measured by RAD correlates with generalization to unseen data.

Abstract

Deep learning algorithms have shown promising results in visual question answering (VQA) tasks, but a more careful look reveals that they often do not understand the rich signal they are being fed with. To understand and better measure the generalization capabilities of VQA systems, we look at their robustness to counterfactually augmented data. Our proposed augmentations are designed to make a focused intervention on a specific property of the question such that the answer changes. Using these augmentations, we propose a new robustness measure, Robustness to Augmented Data (RAD), which measures the consistency of model predictions between original and augmented examples. Through extensive experimentation, we show that RAD, unlike classical accuracy measures, can quantify when state-of-the-art systems are not robust to counterfactuals. We find substantial failure cases which reveal that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition