Incorporating External Knowledge to Answer Open-Domain Visual Questions   with Dynamic Memory Networks

Guohao Li; Hang Su; Wenwu Zhu

arXiv:1712.00733·cs.CV·December 5, 2017·41 cites

Incorporating External Knowledge to Answer Open-Domain Visual Questions with Dynamic Memory Networks

Guohao Li, Hang Su, Wenwu Zhu

PDF

Open Access

TL;DR

This paper introduces a novel framework that enhances open-domain visual question answering by integrating external knowledge with dynamic memory networks to perform complex reasoning.

Contribution

It presents a new method that combines external knowledge retrieval and dynamic memory networks for improved reasoning in VQA tasks.

Findings

01

Achieves state-of-the-art performance on VQA benchmarks.

02

Effectively leverages external knowledge for open-domain questions.

03

Demonstrates improved reasoning capabilities over existing models.

Abstract

Visual Question Answering (VQA) has attracted much attention since it offers insight into the relationships between the multi-modal analysis of images and natural language. Most of the current algorithms are incapable of answering open-domain questions that require to perform reasoning beyond the image contents. To address this issue, we propose a novel framework which endows the model capabilities in answering more complex questions by leveraging massive external knowledge with dynamic memory networks. Specifically, the questions along with the corresponding images trigger a process to retrieve the relevant information in external knowledge bases, which are embedded into a continuous vector space by preserving the entity-relation structures. Afterwards, we employ dynamic memory networks to attend to the large body of facts in the knowledge graph and images, and then perform reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques