Think 360{\deg}: Evaluating the Width-centric Reasoning Capability of MLLMs Beyond Depth
Mingrui Chen, Hexiong Yang, Haogeng Liu, Huaibo Huang, Ran He

TL;DR
This paper introduces a comprehensive multimodal benchmark to evaluate the reasoning width of MLLMs, complementing traditional depth-focused assessments, and reveals current models' limitations in combining wide exploration with deep reasoning.
Contribution
The paper proposes a novel width-centric evaluation protocol and a curated dataset to assess and analyze the reasoning width of multimodal large language models.
Findings
Models perform well on common-sense VQA tasks.
Current models struggle with combining deep and wide reasoning.
Failure modes suggest directions for future model improvements.
Abstract
In this paper, we present a holistic multimodal benchmark that evaluates the reasoning capabilities of MLLMs with an explicit focus on reasoning width, a complementary dimension to the more commonly studied reasoning depth. Specifically, reasoning depth measures the model's ability to carry out long-chain, sequential reasoning in which each step is tightly and rigorously linked to the next. Reasoning width tends to focus more on the model's capacity for broad trial-and-error search or multi-constrained optimization: it must systematically traverse many possible and parallelized reasoning paths, apply diverse constraints to prune unpromising branches, and identify valid solution routes for efficient iteration or backtracking. To achieve it, we carefully curate 1200+ high-quality multimodal cases spanning heterogeneous domains, and propose a fine-grained tree-of-thought evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · Topic Modeling
