Scene Graph Reasoning with Prior Visual Relationship for Visual Question   Answering

Zhuoqian Yang; Zengchang Qin; Jing Yu; Yue Hu

arXiv:1812.09681·cs.MM·August 22, 2019·26 cites

Scene Graph Reasoning with Prior Visual Relationship for Visual Question Answering

Zhuoqian Yang, Zengchang Qin, Jing Yu, Yue Hu

PDF

Open Access

TL;DR

This paper introduces a novel scene graph reasoning approach for VQA that leverages a structured visual representation with a deep semantic encoder and graph convolutional network, achieving state-of-the-art results.

Contribution

It proposes a visual relationship encoder and SceneGCN for improved reasoning over scene graphs in VQA tasks, enhancing interpretability and accuracy.

Findings

01

Achieves 54.56% accuracy on GQA dataset.

02

Outperforms existing models on VQA 2.0.

03

Demonstrates effectiveness and interpretability of the proposed method.

Abstract

One of the key issues of Visual Question Answering (VQA) is to reason with semantic clues in the visual content under the guidance of the question, how to model relational semantics still remains as a great challenge. To fully capture visual semantics, we propose to reason over a structured visual representation - scene graph, with embedded objects and inter-object relationships. This shows great benefit over vanilla vector representations and implicit visual relationship learning. Based on existing visual relationship models, we propose a visual relationship encoder that projects visual relationships into a learned deep semantic space constrained by visual context and language priors. Upon the constructed graph, we propose a Scene Graph Convolutional Network (SceneGCN) to jointly reason the object properties and relational semantics for the correct answer. We demonstrate the model's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsInterpretability