V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions
Chenrui Fan, Yijun Liang, Shweta Bhardwaj, Kwesi Cobbina, Ming Li, Tianyi Zhou

TL;DR
V-REX introduces a comprehensive benchmark and evaluation protocol for multi-step visual reasoning, emphasizing planning and following capabilities in vision-language models through a chain-of-questions approach.
Contribution
The paper presents V-REX, a novel benchmark and evaluation framework for multi-step exploratory visual reasoning, enabling detailed assessment of models' planning and following skills.
Findings
Significant performance gaps in current models' multi-step reasoning abilities.
Clear scaling trends observed across different models.
Notable differences between planning and following capabilities.
Abstract
While many vision-language models (VLMs) are developed to answer well-defined, straightforward questions with highly specified targets, as in most benchmarks, they often struggle in practice with complex open-ended tasks, which usually require multiple rounds of exploration and reasoning in the visual space. Such visual thinking paths not only provide step-by-step exploration and verification as an AI detective but also produce better interpretations of the final answers. However, these paths are challenging to evaluate due to the large exploration space of intermediate steps. To bridge the gap, we develop an evaluation suite, ``Visual Reasoning with multi-step EXploration (V-REX)'', which is composed of a benchmark of challenging visual reasoning tasks requiring native multi-step exploration and an evaluation protocol. V-REX covers rich application scenarios across diverse domains.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling
