R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual   Question Answering

Pan Lu; Lei Ji; Wei Zhang; Nan Duan; Ming Zhou; Jianyong Wang

arXiv:1805.09701·cs.CV·July 23, 2018

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Pan Lu, Lei Ji, Wei Zhang, Nan Duan, Ming Zhou, Jianyong Wang

PDF

1 Repo

TL;DR

This paper introduces R-VQA, a framework that leverages visual relation facts with semantic attention to improve visual question answering, achieving state-of-the-art results by integrating semantic knowledge and visual relations.

Contribution

The paper proposes a novel R-VQA framework that learns and utilizes visual relation facts with semantic attention, enhancing VQA performance beyond existing methods.

Findings

01

Achieves state-of-the-art results on benchmark datasets.

02

Demonstrates the effectiveness of visual relation facts in VQA.

03

Shows benefits of semantic attention in integrating knowledge.

Abstract

Recently, Visual Question Answering (VQA) has emerged as one of the most significant tasks in multimodal learning as it requires understanding both visual and textual modalities. Existing methods mainly rely on extracting image and question features to learn their joint feature embedding via multimodal fusion or attention mechanism. Some recent studies utilize external VQA-independent models to detect candidate entities or attributes in images, which serve as semantic knowledge complementary to the VQA task. However, these candidate entities or attributes might be unrelated to the VQA task and have limited semantic capacities. To better utilize semantic knowledge in images, we propose a novel framework to learn visual relation facts for VQA. Specifically, we build up a Relation-VQA (R-VQA) dataset based on the Visual Genome dataset via a semantic similarity module, in which each data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lupantech/rvqa
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.