TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models

Shima Imani; Seungwhan Moon; Lambert Mathias; Lu Zhang; Babak Damavandi

arXiv:2512.05943·cs.AI·December 15, 2025

TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models

Shima Imani, Seungwhan Moon, Lambert Mathias, Lu Zhang, Babak Damavandi

PDF

Open Access 1 Video

TL;DR

TRACE is a framework that improves the evaluation of vision-language models by analyzing their reasoning process through auxiliary sub-questions, enabling better diagnosis and enhancement of their scientific and mathematical reasoning capabilities.

Contribution

The paper introduces TRACE, a novel framework that diagnoses reasoning trajectories in vision-language models using auxiliary reasoning sets and consistency metrics, surpassing standard end-result evaluations.

Findings

01

Consistency in auxiliary reasoning sets correlates with answer correctness.

02

TRACE effectively identifies failure points in reasoning steps.

03

Confidence regions help filter and improve model reliability.

Abstract

Reliable mathematical and scientific reasoning remains an open challenge for large vision-language models. Standard final-answer evaluation often masks reasoning errors, allowing silent failures to persist. To address this gap, we introduce TRACE, a framework for Transparent Reasoning And Consistency Evaluation that diagnoses reasoning trajectories rather than only end results. At its core, TRACE leverages Auxiliary Reasoning Sets, compact sub question answer pairs that decompose complex problems, evaluate intermediate steps through consistency-based metrics, and expose failures overlooked by standard evaluation. Our experiments show that consistency across ARS correlates with final-answer correctness and helps pinpoint the reasoning steps where failures arise, offering actionable signals for model improvement. Furthermore, TRACE defines confidence regions that distinguish reliable from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques