Joint learning of object graph and relation graph for visual question   answering

Hao Li; Xu Li; Belhal Karimi; Jie Chen; Mingming Sun

arXiv:2205.04188·cs.CV·May 10, 2022·1 cites

Joint learning of object graph and relation graph for visual question answering

Hao Li, Xu Li, Belhal Karimi, Jie Chen, Mingming Sun

PDF

Open Access

TL;DR

This paper introduces a Dual Message-passing enhanced Graph Neural Network (DM-GNN) for visual question answering that balances scene graph information, improving reasoning accuracy and interpretability, especially for complex questions involving attributes and relations.

Contribution

The paper proposes a novel DM-GNN model that transforms scene graphs into object-focused and relation-focused graphs, with dual encoding and message passing to better integrate attributes and relations.

Findings

01

Achieves state-of-the-art results on GQA, VG, and motif-VG datasets.

02

Improves reasoning accuracy for complex questions involving attributes and relations.

03

Enhances interpretability of VQA models through balanced scene graph encoding.

Abstract

Modeling visual question answering(VQA) through scene graphs can significantly improve the reasoning accuracy and interpretability. However, existing models answer poorly for complex reasoning questions with attributes or relations, which causes false attribute selection or missing relation in Figure 1(a). It is because these models cannot balance all kinds of information in scene graphs, neglecting relation and attribute information. In this paper, we introduce a novel Dual Message-passing enhanced Graph Neural Network (DM-GNN), which can obtain a balanced representation by properly encoding multi-scale scene graph information. Specifically, we (i)transform the scene graph into two graphs with diversified focuses on objects and relations; Then we design a dual structure to encode them, which increases the weights from relations (ii)fuse the encoder output with attribute features, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Graph Neural Networks · Advanced Image and Video Retrieval Techniques

MethodsGraph Neural Network