Incorporating External Knowledge to Answer Open-Domain Visual Questions with Dynamic Memory Networks
Guohao Li, Hang Su, Wenwu Zhu

TL;DR
This paper introduces a novel framework that enhances open-domain visual question answering by integrating external knowledge with dynamic memory networks to perform complex reasoning.
Contribution
It presents a new method that combines external knowledge retrieval and dynamic memory networks for improved reasoning in VQA tasks.
Findings
Achieves state-of-the-art performance on VQA benchmarks.
Effectively leverages external knowledge for open-domain questions.
Demonstrates improved reasoning capabilities over existing models.
Abstract
Visual Question Answering (VQA) has attracted much attention since it offers insight into the relationships between the multi-modal analysis of images and natural language. Most of the current algorithms are incapable of answering open-domain questions that require to perform reasoning beyond the image contents. To address this issue, we propose a novel framework which endows the model capabilities in answering more complex questions by leveraging massive external knowledge with dynamic memory networks. Specifically, the questions along with the corresponding images trigger a process to retrieve the relevant information in external knowledge bases, which are embedded into a continuous vector space by preserving the entity-relation structures. Afterwards, we employ dynamic memory networks to attend to the large body of facts in the knowledge graph and images, and then perform reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
