ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?

Haonan Han; Jiancheng Huang; Xiaopeng Sun; Junyan He; Rui Yang; Jie Hu; Xiaojiang Peng; Lin Ma; Xiaoming Wei; Xiu Li

arXiv:2603.25823·cs.CV·March 30, 2026

ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?

Haonan Han, Jiancheng Huang, Xiaopeng Sun, Junyan He, Rui Yang, Jie Hu, Xiaojiang Peng, Lin Ma, Xiaoming Wei, Xiu Li

PDF

2 Repos 1 Datasets

TL;DR

ViGoR-Bench is a comprehensive benchmark designed to evaluate the reasoning capabilities of visual generative models across multiple modalities and cognitive dimensions, revealing significant reasoning gaps in current state-of-the-art systems.

Contribution

The paper introduces ViGoR-Bench, a novel unified framework with innovative evaluation mechanisms to assess and diagnose reasoning abilities in visual generative models.

Findings

01

State-of-the-art models show notable reasoning deficits.

02

ViGoR-Bench provides granular diagnostic insights.

03

The benchmark covers diverse cross-modal tasks.

Abstract

Beneath the stunning visual fidelity of modern AIGC models lies a "logical desert", where systems fail tasks that require physical, causal, or complex spatial reasoning. Current evaluations largely rely on superficial metrics or fragmented benchmarks, creating a ``performance mirage'' that overlooks the generative process. To address this, we introduce ViGoR Vision-G}nerative Reasoning-centric Benchmark), a unified framework designed to dismantle this mirage. ViGoR distinguishes itself through four key innovations: 1) holistic cross-modal coverage bridging Image-to-Image and Video tasks; 2) a dual-track mechanism evaluating both intermediate processes and final results; 3) an evidence-grounded automated judge ensuring high human alignment; and 4) granular diagnostic analysis that decomposes performance into fine-grained cognitive dimensions. Experiments on over 20 leading models reveal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

VincentHancoder/ViGoR-Bench
dataset· 3.8k dl
3.8k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.