Visual Abductive Reasoning

Chen Liang; Wenguan Wang; Tianfei Zhou; Yi Yang

arXiv:2203.14040·cs.CV·March 29, 2022

Visual Abductive Reasoning

Chen Liang, Wenguan Wang, Tianfei Zhou, Yi Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Visual Abductive Reasoning (VAR), a new task and dataset for AI to infer the best explanations for incomplete visual observations, advancing reasoning capabilities in computer vision.

Contribution

It proposes the VAR dataset and a novel Reasoner model that captures causal structures and refines hypotheses, pushing forward abductive reasoning in visual understanding.

Findings

01

Reasoner outperforms many video-language models on VAR

02

Models still lag behind human performance in abductive reasoning

03

The dataset and model foster future research in reasoning beyond observation

Abstract

Abductive reasoning seeks the likeliest possible explanation for partial observations. Although abduction is frequently employed in human daily reasoning, it is rarely explored in computer vision literature. In this paper, we propose a new task and dataset, Visual Abductive Reasoning (VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations. Given an incomplete set of visual events, AI systems are required to not only describe what is observed, but also infer the hypothesis that can best explain the visual premise. Based on our large-scale VAR dataset, we devise a strong baseline model, Reasoner (causal-and-cascaded reasoning Transformer). First, to capture the causal structure of the observations, a contextualized directional position embedding strategy is adopted in the encoder, that yields discriminative representations for the premise and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leonnnop/var
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization