REXUP: I REason, I EXtract, I UPdate with Structured Compositional   Reasoning for Visual Question Answering

Siwen Luo; Soyeon Caren Han; Kaiyuan Sun; Josiah Poon

arXiv:2007.13262·cs.CV·April 6, 2021

REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering

Siwen Luo, Soyeon Caren Han, Kaiyuan Sun, Josiah Poon

PDF

Open Access 1 Repo

TL;DR

REXUP is a novel deep reasoning model for visual question answering that effectively captures step-by-step reasoning and complex object relationships using structured visual and textual information, outperforming previous methods.

Contribution

The paper introduces REXUP, a deep reasoning VQA model with explicit visual structure-aware textual information and dual-branch architecture, advancing the state-of-the-art performance.

Findings

01

Achieves 92.7% validation accuracy on GQA dataset

02

Outperforms previous state-of-the-art methods

03

Demonstrates effectiveness through extensive ablation studies

Abstract

Visual question answering (VQA) is a challenging multi-modal task that requires not only the semantic understanding of both images and questions, but also the sound perception of a step-by-step reasoning process that would lead to the correct answer. So far, most successful attempts in VQA have been focused on only one aspect, either the interaction of visual pixel features of images and word features of questions, or the reasoning process of answering the question in an image with simple objects. In this paper, we propose a deep reasoning VQA model with explicit visual structure-aware textual information, and it works well in capturing step-by-step reasoning process and detecting a complex object-relationship in photo-realistic images. REXUP network consists of two branches, image object-oriented and scene graph oriented, which jointly works with super-diagonal fusion compositional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

usydnlp/REXUP
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques