Journey Before Destination: On the importance of Visual Faithfulness in Slow Thinking

Rheeya Uppaal; Phu Mon Htut; Min Bai; Nikolaos Pappas; Zheng Qi; Sandesh Swamy

arXiv:2512.12218·cs.CV·December 22, 2025

Journey Before Destination: On the importance of Visual Faithfulness in Slow Thinking

Rheeya Uppaal, Phu Mon Htut, Min Bai, Nikolaos Pappas, Zheng Qi, Sandesh Swamy

PDF

Open Access 1 Video

TL;DR

This paper emphasizes the importance of visual faithfulness in reasoning chains generated by vision-language models, proposing a new evaluation metric and self-reflection method to improve the reliability of multimodal reasoning.

Contribution

It introduces a novel metric for assessing visual faithfulness in reasoning chains and a self-reflection technique to detect and regenerate unfaithful perception steps without training.

Findings

01

Reduces unfaithful perception rate in reasoning chains

02

Maintains final-answer accuracy while improving faithfulness

03

Enhances reliability of multimodal reasoning models

Abstract

Reasoning-augmented vision language models (VLMs) generate explicit chains of thought that promise greater capability and transparency but also introduce new failure modes: models may reach correct answers via visually unfaithful intermediate steps, or reason faithfully yet fail on the final prediction. Standard evaluations that only measure final-answer accuracy cannot distinguish these behaviors. We introduce the visual faithfulness of reasoning chains as a distinct evaluation dimension, focusing on whether the perception steps of a reasoning chain are grounded in the image. We propose a training- and reference-free framework that decomposes chains into perception versus reasoning steps and uses off-the-shelf VLM judges for step-level faithfulness, additionally verifying this approach through a human meta-evaluation. Building on this metric, we present a lightweight self-reflection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Journey Before Destination: On the importance of Visual Faithfulness in Slow Thinking· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Embodied and Extended Cognition · Child and Animal Learning Development