Ask Me Anything: Free-form Visual Question Answering Based on Knowledge   from External Sources

Qi Wu; Peng Wang; Chunhua Shen; Anthony Dick; Anton van den Hengel

arXiv:1511.06973·cs.CV·April 15, 2016·47 cites

Ask Me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources

Qi Wu, Peng Wang, Chunhua Shen, Anthony Dick, Anton van den Hengel

PDF

Open Access 1 Video

TL;DR

This paper introduces a flexible visual question answering system that combines image content with external knowledge bases, enabling it to answer complex, natural language questions about images.

Contribution

It presents a novel method that merges image semantic content with external textual knowledge to enhance neural network-based visual question answering.

Findings

01

Achieved state-of-the-art results on Toronto COCO-QA dataset.

02

Achieved state-of-the-art results on MS COCO-VQA dataset.

03

Demonstrated ability to answer questions beyond image content using external knowledge.

Abstract

We propose a method for visual question answering which combines an internal representation of the content of an image with information extracted from a general knowledge base to answer a broad range of image-based questions. This allows more complex questions to be answered using the predominant neural network-based approach than has previously been possible. It particularly allows questions to be asked about the contents of an image, even when the image itself does not contain the whole answer. The method constructs a textual representation of the semantic content of an image, and merges it with textual information sourced from a knowledge base, to develop a deeper understanding of the scene viewed. Priming a recurrent neural network with this combined information, and the submitted question, leads to a very flexible visual question answering approach. We are specifically able to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge From External Sources· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning