Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual   Question Answering

Chengxiang Yin; Zhengping Che; Kun Wu; Zhiyuan Xu; Jian Tang

arXiv:2312.12723·cs.CV·December 21, 2023·1 cites

Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual Question Answering

Chengxiang Yin, Zhengping Che, Kun Wu, Zhiyuan Xu, Jian Tang

PDF

Open Access

TL;DR

This paper introduces MCR-MemNN, a novel memory-augmented framework for knowledge-based visual question answering that effectively integrates external knowledge, visual content, and questions to improve answer accuracy.

Contribution

It proposes a new multi-clue reasoning framework with memory networks that better exploits external knowledge for KB-VQA tasks.

Findings

01

Outperforms existing KB-VQA methods on benchmark datasets.

02

Effectively integrates visual, question, and external knowledge modalities.

03

Demonstrates significant improvement in answer accuracy.

Abstract

Visual Question Answering (VQA) has emerged as one of the most challenging tasks in artificial intelligence due to its multi-modal nature. However, most existing VQA methods are incapable of handling Knowledge-based Visual Question Answering (KB-VQA), which requires external knowledge beyond visible contents to answer questions about a given image. To address this issue, we propose a novel framework that endows the model with capabilities of answering more general questions, and achieves a better exploitation of external knowledge through generating Multiple Clues for Reasoning with Memory Neural Networks (MCR-MemNN). Specifically, a well-defined detector is adopted to predict image-question related relation phrases, each of which delivers two complementary clues to retrieve the supporting facts from external knowledge base (KB), which are further encoded into a continuous embedding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsBalanced Selection