Beyond Accuracy: Evaluating Grounded Visual Evidence in Thinking with Images

Xuchen Li; Xuzhao Li; Renjie Pi; Shiyu Hu; Jian Zhao; Jiahui Gao

arXiv:2601.11633·cs.CV·January 21, 2026

Beyond Accuracy: Evaluating Grounded Visual Evidence in Thinking with Images

Xuchen Li, Xuzhao Li, Renjie Pi, Shiyu Hu, Jian Zhao, Jiahui Gao

PDF

Open Access

TL;DR

This paper introduces ViEBench, a new benchmark for evaluating visual reasoning in vision-language models, focusing on faithfulness and explainability beyond just accuracy, with detailed diagnostics and expert-annotated visual evidence.

Contribution

It presents ViEBench, a process-verifiable benchmark with multi-scenario images and a dual-axis evaluation matrix to assess visual reasoning and grounding fidelity in VLMs.

Findings

01

VLMs sometimes produce correct answers without relevant evidence.

02

Models can locate evidence but fail to use it effectively.

03

ViEBench enables transparent diagnosis of model reasoning behaviors.

Abstract

Despite the remarkable progress of Vision-Language Models (VLMs) in adopting "Thinking-with-Images" capabilities, accurately evaluating the authenticity of their reasoning process remains a critical challenge. Existing benchmarks mainly rely on outcome-oriented accuracy, lacking the capability to assess whether models can accurately leverage fine-grained visual cues for multi-step reasoning. To address these limitations, we propose ViEBench, a process-verifiable benchmark designed to evaluate faithful visual reasoning. Comprising 200 multi-scenario high-resolution images with expert-annotated visual evidence, ViEBench uniquely categorizes tasks by difficulty into perception and reasoning dimensions, where reasoning tasks require utilizing localized visual details with prior knowledge. To establish comprehensive evaluation criteria, we introduce a dual-axis matrix that provides…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Child and Animal Learning Development