Visual Question Answering as a Meta Learning Task

Damien Teney; Anton van den Hengel

arXiv:1711.08105·cs.CV·November 23, 2017

Visual Question Answering as a Meta Learning Task

Damien Teney, Anton van den Hengel

PDF

TL;DR

This paper introduces a meta learning approach to Visual Question Answering (VQA), enabling models to adapt to new questions and answers at test time using support sets, thus improving flexibility and sample efficiency.

Contribution

It proposes a novel meta learning framework for VQA that separates question answering from stored knowledge, allowing dynamic extension without retraining.

Findings

01

Capable of producing novel answers unseen during training

02

Higher recall of rare answers compared to state-of-the-art

03

Improved sample efficiency with less initial data

Abstract

The predominant approach to Visual Question Answering (VQA) demands that the model represents within its weights all of the information required to answer any question about any image. Learning this information from any real training set seems unlikely, and representing it in a reasonable number of weights doubly so. We propose instead to approach VQA as a meta learning task, thus separating the question answering method from the information required. At test time, the method is provided with a support set of example questions/answers, over which it reasons to resolve the given question. The support set is not fixed and can be extended without retraining, thereby expanding the capabilities of the model. To exploit this dynamically provided information, we adapt a state-of-the-art VQA model with two techniques from the recent meta learning literature, namely prototypical networks and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.