Dynamic Memory Networks for Visual and Textual Question Answering
Caiming Xiong, Stephen Merity, Richard Socher

TL;DR
This paper introduces an improved dynamic memory network model that enhances question answering capabilities across text and visual data, achieving state-of-the-art results without requiring supporting fact annotations.
Contribution
The paper proposes several improvements to the DMN architecture, including a new input module for images, enabling effective visual question answering without supporting fact supervision.
Findings
Achieves state-of-the-art results on visual question answering datasets.
Improves performance on text question-answering tasks without supporting fact supervision.
Demonstrates versatility across multiple modalities.
Abstract
Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering. One such architecture, the dynamic memory network (DMN), obtained high accuracy on a variety of language tasks. However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training or whether it could be applied to other modalities such as images. Based on an analysis of the DMN, we propose several improvements to its memory and input modules. Together with these changes we introduce a novel input module for images in order to be able to answer visual questions. Our new DMN+ model improves the state of the art on both the Visual Question Answering dataset and the \babi-10k text question-answering dataset without supporting fact supervision.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Image and Video Retrieval Techniques
MethodsSoftmax · Gated Recurrent Unit · Dynamic Memory Network · Memory Network
