Relation-Aware Graph Attention Network for Visual Question Answering

Linjie Li; Zhe Gan; Yu Cheng; Jingjing Liu

arXiv:1903.12314·cs.CV·October 11, 2019·56 cites

Relation-Aware Graph Attention Network for Visual Question Answering

Linjie Li, Zhe Gan, Yu Cheng, Jingjing Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces ReGAT, a relation-aware graph attention network that models object interactions in images to improve visual question answering accuracy, outperforming previous methods on standard datasets.

Contribution

The paper presents a novel graph attention network that encodes multi-type object relations for VQA, enhancing understanding of complex visual scenes.

Findings

01

ReGAT outperforms state-of-the-art models on VQA 2.0 and VQA-CP v2 datasets.

02

ReGAT effectively models both explicit and implicit object relations.

03

ReGAT is compatible with existing VQA architectures, serving as a generic relation encoder.

Abstract

In order to answer semantically-complicated questions about an image, a Visual Question Answering (VQA) model needs to fully understand the visual scene in the image, especially the interactive dynamics between different objects. We propose a Relation-aware Graph Attention Network (ReGAT), which encodes each image into a graph and models multi-type inter-object relations via a graph attention mechanism, to learn question-adaptive relation representations. Two types of visual object relations are explored: (i) Explicit Relations that represent geometric positions and semantic interactions between objects; and (ii) Implicit Relations that capture the hidden dynamics between image regions. Experiments demonstrate that ReGAT outperforms prior state-of-the-art approaches on both VQA 2.0 and VQA-CP v2 datasets. We further show that ReGAT is compatible to existing VQA architectures, and can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

linjieli222/VQA_ReGAT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning