AbductiveMLLM: Boosting Visual Abductive Reasoning Within MLLMs

Boyu Chang; Qi Wang; Xi Guo; Zhixiong Nan; Yazhou Yao; Tianfei Zhou

arXiv:2601.02771·cs.CV·January 7, 2026

AbductiveMLLM: Boosting Visual Abductive Reasoning Within MLLMs

Boyu Chang, Qi Wang, Xi Guo, Zhixiong Nan, Yazhou Yao, Tianfei Zhou

PDF

Open Access 1 Models 1 Video

TL;DR

AbductiveMLLM enhances visual abductive reasoning in multimodal large language models by integrating verbal hypothesis pruning and pictorial scene imagining, leading to state-of-the-art results on VAR benchmarks.

Contribution

This paper introduces a novel dual-mode framework with REASONER and IMAGINER components to improve abductive inference in MLLMs, inspired by human cognition.

Findings

01

Achieves state-of-the-art performance on VAR benchmarks.

02

Outperforms traditional solutions and advanced MLLMs.

03

Demonstrates effective integration of verbal and pictorial reasoning.

Abstract

Visual abductive reasoning (VAR) is a challenging task that requires AI systems to infer the most likely explanation for incomplete visual observations. While recent MLLMs develop strong general-purpose multimodal reasoning capabilities, they fall short in abductive inference, as compared to human beings. To bridge this gap, we draw inspiration from the interplay between verbal and pictorial abduction in human cognition, and propose to strengthen abduction of MLLMs by mimicking such dual-mode behavior. Concretely, we introduce AbductiveMLLM comprising of two synergistic components: REASONER and IMAGINER. The REASONER operates in the verbal domain. It first explores a broad space of possible explanations using a blind LLM and then prunes visually incongruent hypotheses based on cross-modal causal alignment. The remaining hypotheses are introduced into the MLLM as targeted priors,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
PtRain/AbductiveMLLM
model

Videos

AbductiveMLLM: Boosting Visual Abductive Reasoning Within MLLMs· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis