Visual Abductive Reasoning
Chen Liang, Wenguan Wang, Tianfei Zhou, Yi Yang

TL;DR
This paper introduces Visual Abductive Reasoning (VAR), a new task and dataset for AI to infer the best explanations for incomplete visual observations, advancing reasoning capabilities in computer vision.
Contribution
It proposes the VAR dataset and a novel Reasoner model that captures causal structures and refines hypotheses, pushing forward abductive reasoning in visual understanding.
Findings
Reasoner outperforms many video-language models on VAR
Models still lag behind human performance in abductive reasoning
The dataset and model foster future research in reasoning beyond observation
Abstract
Abductive reasoning seeks the likeliest possible explanation for partial observations. Although abduction is frequently employed in human daily reasoning, it is rarely explored in computer vision literature. In this paper, we propose a new task and dataset, Visual Abductive Reasoning (VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations. Given an incomplete set of visual events, AI systems are required to not only describe what is observed, but also infer the hypothesis that can best explain the visual premise. Based on our large-scale VAR dataset, we devise a strong baseline model, Reasoner (causal-and-cascaded reasoning Transformer). First, to capture the causal structure of the observations, a contextualized directional position embedding strategy is adopted in the encoder, that yields discriminative representations for the premise and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization
