Scene Graph Reasoning with Prior Visual Relationship for Visual Question Answering
Zhuoqian Yang, Zengchang Qin, Jing Yu, Yue Hu

TL;DR
This paper introduces a novel scene graph reasoning approach for VQA that leverages a structured visual representation with a deep semantic encoder and graph convolutional network, achieving state-of-the-art results.
Contribution
It proposes a visual relationship encoder and SceneGCN for improved reasoning over scene graphs in VQA tasks, enhancing interpretability and accuracy.
Findings
Achieves 54.56% accuracy on GQA dataset.
Outperforms existing models on VQA 2.0.
Demonstrates effectiveness and interpretability of the proposed method.
Abstract
One of the key issues of Visual Question Answering (VQA) is to reason with semantic clues in the visual content under the guidance of the question, how to model relational semantics still remains as a great challenge. To fully capture visual semantics, we propose to reason over a structured visual representation - scene graph, with embedded objects and inter-object relationships. This shows great benefit over vanilla vector representations and implicit visual relationship learning. Based on existing visual relationship models, we propose a visual relationship encoder that projects visual relationships into a learned deep semantic space constrained by visual context and language priors. Upon the constructed graph, we propose a Scene Graph Convolutional Network (SceneGCN) to jointly reason the object properties and relational semantics for the correct answer. We demonstrate the model's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsInterpretability
