Ask Your Neurons: A Deep Learning Approach to Visual Question Answering

Mateusz Malinowski; Marcus Rohrbach; Mario Fritz

arXiv:1605.02697·cs.CV·November 28, 2016

Ask Your Neurons: A Deep Learning Approach to Visual Question Answering

Mateusz Malinowski, Marcus Rohrbach, Mario Fritz

PDF

1 Repo

TL;DR

This paper introduces 'Ask Your Neurons', a deep learning model for visual question answering that combines image and language understanding, analyzes human consensus, and improves performance on the DAQUAR dataset.

Contribution

It presents a scalable end-to-end multi-modal model for visual question answering and introduces new metrics and datasets to analyze human consensus and model performance.

Findings

01

Strong model performance using global image representations

02

Analysis of language-only information with a new human baseline

03

Enhanced dataset with consensus answers improves evaluation

Abstract

We address a question answering task on real-world images that is set up as a Visual Turing Test. By combining latest advances in image representation and natural language processing, we propose Ask Your Neurons, a scalable, jointly trained, end-to-end formulation to this problem. In contrast to previous efforts, we are facing a multi-modal problem where the language output (answer) is conditioned on visual and natural language inputs (image and question). We provide additional insights into the problem by analyzing how much information is contained only in the language part for which we provide a new human baseline. To study human consensus, which is related to the ambiguities inherent in this challenging task, we propose two novel metrics and collect additional answers which extend the original DAQUAR dataset to DAQUAR-Consensus. Moreover, we also extend our analysis to VQA, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mateuszmalinowski/visual_turing_test-tutorial
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.