Joint learning of object graph and relation graph for visual question answering
Hao Li, Xu Li, Belhal Karimi, Jie Chen, Mingming Sun

TL;DR
This paper introduces a Dual Message-passing enhanced Graph Neural Network (DM-GNN) for visual question answering that balances scene graph information, improving reasoning accuracy and interpretability, especially for complex questions involving attributes and relations.
Contribution
The paper proposes a novel DM-GNN model that transforms scene graphs into object-focused and relation-focused graphs, with dual encoding and message passing to better integrate attributes and relations.
Findings
Achieves state-of-the-art results on GQA, VG, and motif-VG datasets.
Improves reasoning accuracy for complex questions involving attributes and relations.
Enhances interpretability of VQA models through balanced scene graph encoding.
Abstract
Modeling visual question answering(VQA) through scene graphs can significantly improve the reasoning accuracy and interpretability. However, existing models answer poorly for complex reasoning questions with attributes or relations, which causes false attribute selection or missing relation in Figure 1(a). It is because these models cannot balance all kinds of information in scene graphs, neglecting relation and attribute information. In this paper, we introduce a novel Dual Message-passing enhanced Graph Neural Network (DM-GNN), which can obtain a balanced representation by properly encoding multi-scale scene graph information. Specifically, we (i)transform the scene graph into two graphs with diversified focuses on objects and relations; Then we design a dual structure to encode them, which increases the weights from relations (ii)fuse the encoder output with attribute features, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Graph Neural Networks · Advanced Image and Video Retrieval Techniques
MethodsGraph Neural Network
