V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions

Chenrui Fan; Yijun Liang; Shweta Bhardwaj; Kwesi Cobbina; Ming Li; Tianyi Zhou

arXiv:2512.11995·cs.CV·December 16, 2025

V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions

Chenrui Fan, Yijun Liang, Shweta Bhardwaj, Kwesi Cobbina, Ming Li, Tianyi Zhou

PDF

Open Access 1 Datasets

TL;DR

V-REX introduces a comprehensive benchmark and evaluation protocol for multi-step visual reasoning, emphasizing planning and following capabilities in vision-language models through a chain-of-questions approach.

Contribution

The paper presents V-REX, a novel benchmark and evaluation framework for multi-step exploratory visual reasoning, enabling detailed assessment of models' planning and following skills.

Findings

01

Significant performance gaps in current models' multi-step reasoning abilities.

02

Clear scaling trends observed across different models.

03

Notable differences between planning and following capabilities.

Abstract

While many vision-language models (VLMs) are developed to answer well-defined, straightforward questions with highly specified targets, as in most benchmarks, they often struggle in practice with complex open-ended tasks, which usually require multiple rounds of exploration and reasoning in the visual space. Such visual thinking paths not only provide step-by-step exploration and verification as an AI detective but also produce better interpretations of the final answers. However, these paths are challenging to evaluate due to the large exploration space of intermediate steps. To bridge the gap, we develop an evaluation suite, ``Visual Reasoning with multi-step EXploration (V-REX)'', which is composed of a benchmark of challenging visual reasoning tasks requiring native multi-step exploration and an evaluation protocol. V-REX covers rich application scenarios across diverse domains.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

umd-zhou-lab/V-REX
dataset· 13 dl
13 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling