Human-Adversarial Visual Question Answering

Sasha Sheng; Amanpreet Singh; Vedanuj Goswami; Jose Alberto Lopez; Magana; Wojciech Galuba; Devi Parikh; Douwe Kiela

arXiv:2106.02280·cs.CV·June 7, 2021·22 cites

Human-Adversarial Visual Question Answering

Sasha Sheng, Amanpreet Singh, Vedanuj Goswami, Jose Alberto Lopez, Magana, Wojciech Galuba, Devi Parikh, Douwe Kiela

PDF

Open Access 1 Video

TL;DR

This paper introduces AdVQA, a benchmark using human-adversarial examples to evaluate and challenge state-of-the-art VQA models, revealing their vulnerabilities and guiding future improvements.

Contribution

It presents a novel adversarial benchmark for VQA, created through human interaction to find questions that fool current models, highlighting their weaknesses.

Findings

01

Most state-of-the-art models perform poorly on adversarial examples.

02

Adversarial examples reveal specific weaknesses in current VQA models.

03

The benchmark provides insights for future research directions.

Abstract

Performance on the most commonly used Visual Question Answering dataset (VQA v2) is starting to approach human accuracy. However, in interacting with state-of-the-art VQA models, it is clear that the problem is far from being solved. In order to stress test VQA models, we benchmark them against human-adversarial examples. Human subjects interact with a state-of-the-art VQA model, and for each image in the dataset, attempt to find a question where the model's predicted answer is incorrect. We find that a wide range of state-of-the-art models perform poorly when evaluated on these examples. We conduct an extensive analysis of the collected adversarial examples and provide guidance on future research directions. We hope that this Adversarial VQA (AdVQA) benchmark can help drive progress in the field and advance the state of the art.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Human-Adversarial Visual Question Answering· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition