Exploring Models and Data for Image Question Answering
Mengye Ren, Ryan Kiros, Richard Zemel

TL;DR
This paper introduces new neural network models and datasets for image question answering, achieving significant performance improvements and generating larger, more balanced datasets from image descriptions.
Contribution
It proposes a neural network approach without intermediate detection steps and a question generation algorithm to create extensive QA datasets from image descriptions.
Findings
Model performs 1.8 times better than previous results.
Generated larger, more balanced QA datasets.
Baseline results on new datasets are provided.
Abstract
This work aims to address the problem of image-based question-answering (QA) with new models and datasets. In our work, we propose to use neural networks and visual semantic embeddings, without intermediate stages such as object detection and image segmentation, to predict answers to simple questions about images. Our model performs 1.8 times better than the only published results on an existing image QA dataset. We also present a question generation algorithm that converts image descriptions, which are widely available, into QA form. We used this algorithm to produce an order-of-magnitude larger dataset, with more evenly distributed answers. A suite of baseline results on this new dataset are also presented.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
