Syntax Tree Constrained Graph Network for Visual Question Answering
Xiangrui Su, Qi Zhang, Chongyang Shi, Jiachang Liu, and Liang Hu

TL;DR
This paper introduces a novel Syntax Tree Constrained Graph Network (STCGN) that leverages syntax trees of questions to enhance visual question answering accuracy by integrating syntactic information into the reasoning process.
Contribution
The paper proposes a new model that extracts syntax trees from questions and incorporates hierarchical syntax features into VQA, improving understanding and reasoning.
Findings
Outperforms existing VQA models on VQA2.0 dataset
Effectively captures syntactic structures to refine visual question understanding
Demonstrates the importance of syntax information in VQA tasks
Abstract
Visual Question Answering (VQA) aims to automatically answer natural language questions related to given image content. Existing VQA methods integrate vision modeling and language understanding to explore the deep semantics of the question. However, these methods ignore the significant syntax information of the question, which plays a vital role in understanding the essential semantics of the question and guiding the visual feature refinement. To fill the gap, we suggested a novel Syntax Tree Constrained Graph Network (STCGN) for VQA based on entity message passing and syntax tree. This model is able to extract a syntax tree from questions and obtain more precise syntax information. Specifically, we parse questions and obtain the question syntax tree using the Stanford syntax parsing tool. From the word level and phrase level, syntactic phrase features and question features are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
