VisReason: A Large-Scale Dataset for Visual Chain-of-Thought Reasoning
Lingxiao Li, Yifan Wang, Xinyan Gao, Chen Tang, Xiangyu Yue, Chenyu You

TL;DR
This paper introduces VisReason, a large-scale dataset with annotated visual reasoning examples, to enhance multimodal large language models' ability to perform complex, interpretable, and spatially grounded visual reasoning tasks.
Contribution
It presents VisReason and VisReason-Pro datasets, enabling systematic training of MLLMs for human-like visual reasoning with improved accuracy and interpretability.
Findings
Fine-tuning Qwen2.5-VL on VisReason improves reasoning accuracy.
VisReason enhances interpretability and generalization in visual reasoning.
The datasets support multi-domain, step-by-step visual reasoning.
Abstract
Chain-of-Thought (CoT) prompting has proven remarkably effective for eliciting complex reasoning in large language models (LLMs). Yet, its potential in multimodal large language models (MLLMs) remains largely untapped, hindered by the absence of large-scale datasets that capture the rich, spatially grounded reasoning intrinsic to visual understanding. Existing visual-CoT resources are typically small, domain-specific, or lack the human-like stepwise structure necessary for compositional visual reasoning. In this paper, we introduce VisReason, a large-scale dataset designed to advance visual Chain-of-Thought reasoning. VisReason comprises 489K annotated examples spanning four diverse domains, each featuring multi-round, human-like rationales that guide MLLMs through interpretable visual reasoning steps. Building upon this, we curate VisReason-Pro, a 165K subset produced with a stronger…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks
