# Why Does a Visual Question Have Different Answers?

**Authors:** Nilavra Bhattacharya, Qing Li, Danna Gurari

arXiv: 1908.04342 · 2019-08-16

## TL;DR

This paper investigates why different people give different answers to the same visual question by proposing a taxonomy, creating labeled datasets, and developing a predictive algorithm to identify reasons for answer discrepancies.

## Contribution

It introduces the first taxonomy of reasons for answer differences, creates large labeled datasets, and proposes a novel algorithm to predict answer divergence causes from visual questions.

## Key findings

- Our approach outperforms baselines on two datasets.
- The datasets and code are publicly available.
- The taxonomy helps understand answer variability in visual question answering.

## Abstract

Visual question answering is the task of returning the answer to a question about an image. A challenge is that different people often provide different answers to the same visual question. To our knowledge, this is the first work that aims to understand why. We propose a taxonomy of nine plausible reasons, and create two labelled datasets consisting of ~45,000 visual questions indicating which reasons led to answer differences. We then propose a novel problem of predicting directly from a visual question which reasons will cause answer differences as well as a novel algorithm for this purpose. Experiments demonstrate the advantage of our approach over several related baselines on two diverse datasets. We publicly share the datasets and code at https://vizwiz.org.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.04342/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/1908.04342/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/1908.04342/full.md

---
Source: https://tomesphere.com/paper/1908.04342