Exploring Models and Data for Image Question Answering

Mengye Ren; Ryan Kiros; Richard Zemel

arXiv:1505.02074·cs.LG·December 1, 2015·384 cites

Exploring Models and Data for Image Question Answering

Mengye Ren, Ryan Kiros, Richard Zemel

PDF

Open Access 3 Repos

TL;DR

This paper introduces new neural network models and datasets for image question answering, achieving significant performance improvements and generating larger, more balanced datasets from image descriptions.

Contribution

It proposes a neural network approach without intermediate detection steps and a question generation algorithm to create extensive QA datasets from image descriptions.

Findings

01

Model performs 1.8 times better than previous results.

02

Generated larger, more balanced QA datasets.

03

Baseline results on new datasets are provided.

Abstract

This work aims to address the problem of image-based question-answering (QA) with new models and datasets. In our work, we propose to use neural networks and visual semantic embeddings, without intermediate stages such as object detection and image segmentation, to predict answers to simple questions about images. Our model performs 1.8 times better than the only published results on an existing image QA dataset. We also present a question generation algorithm that converts image descriptions, which are widely available, into QA form. We used this algorithm to produce an order-of-magnitude larger dataset, with more evenly distributed answers. A suite of baseline results on this new dataset are also presented.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques