REX: Reasoning-aware and Grounded Explanation

Shi Chen; Qi Zhao

arXiv:2203.06107·cs.CV·March 14, 2022

REX: Reasoning-aware and Grounded Explanation

Shi Chen, Qi Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces REX, a reasoning-aware and grounded explanation framework for visual reasoning, which generates multi-modal explanations by traversing reasoning steps and grounding keywords, improving interpretability and reasoning accuracy.

Contribution

The paper proposes a novel multi-modal explanation method that models word-region correspondence and constructs a large dataset of explanations, advancing interpretability in visual reasoning.

Findings

01

Enhanced visual grounding capability

02

Improved interpretability and reasoning performance

03

Effective under multi-task and transfer learning settings

Abstract

Effectiveness and interpretability are two essential properties for trustworthy AI systems. Most recent studies in visual reasoning are dedicated to improving the accuracy of predicted answers, and less attention is paid to explaining the rationales behind the decisions. As a result, they commonly take advantage of spurious biases instead of actually reasoning on the visual-textual data, and have yet developed the capability to explain their decision making by considering key information from both modalities. This paper aims to close the gap from three distinct perspectives: first, we define a new type of multi-modal explanations that explain the decisions by progressively traversing the reasoning process and grounding keywords in the images. We develop a functional program to sequentially execute different reasoning steps and construct a new dataset with 1,040,830 multi-modal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

szzexpoi/rex
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Video Analysis and Summarization