# What's in a Question: Using Visual Questions as a Form of Supervision

**Authors:** Siddha Ganju, Olga Russakovsky, Abhinav Gupta

arXiv: 1704.03895 · 2017-04-14

## TL;DR

This paper explores using unanswered visual questions as a new form of weak supervision for image understanding, demonstrating that questions alone can provide valuable information and improve model performance.

## Contribution

It introduces an analysis of visual questions as supervision, proposes modifications to VQA models to leverage unanswered questions, and shows a 7.1% accuracy boost on VQA benchmarks.

## Key findings

- Questions contain informative cues about image content.
- Simple model modifications enable use of unanswered questions.
- Data augmentation inspired by questions improves accuracy.

## Abstract

Collecting fully annotated image datasets is challenging and expensive. Many types of weak supervision have been explored: weak manual annotations, web search results, temporal continuity, ambient sound and others. We focus on one particular unexplored mode: visual questions that are asked about images. The key observation that inspires our work is that the question itself provides useful information about the image (even without the answer being available). For instance, the question "what is the breed of the dog?" informs the AI that the animal in the scene is a dog and that there is only one dog present. We make three contributions: (1) providing an extensive qualitative and quantitative analysis of the information contained in human visual questions, (2) proposing two simple but surprisingly effective modifications to the standard visual question answering models that allow them to make use of weak supervision in the form of unanswered questions associated with images and (3) demonstrating that a simple data augmentation strategy inspired by our insights results in a 7.1% improvement on the standard VQA benchmark.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.03895/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/1704.03895/full.md

## References

49 references — full list in the complete paper: https://tomesphere.com/paper/1704.03895/full.md

---
Source: https://tomesphere.com/paper/1704.03895