TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models
Shima Imani, Seungwhan Moon, Lambert Mathias, Lu Zhang, Babak Damavandi

TL;DR
TRACE is a framework that improves the evaluation of vision-language models by analyzing their reasoning process through auxiliary sub-questions, enabling better diagnosis and enhancement of their scientific and mathematical reasoning capabilities.
Contribution
The paper introduces TRACE, a novel framework that diagnoses reasoning trajectories in vision-language models using auxiliary reasoning sets and consistency metrics, surpassing standard end-result evaluations.
Findings
Consistency in auxiliary reasoning sets correlates with answer correctness.
TRACE effectively identifies failure points in reasoning steps.
Confidence regions help filter and improve model reliability.
Abstract
Reliable mathematical and scientific reasoning remains an open challenge for large vision-language models. Standard final-answer evaluation often masks reasoning errors, allowing silent failures to persist. To address this gap, we introduce TRACE, a framework for Transparent Reasoning And Consistency Evaluation that diagnoses reasoning trajectories rather than only end results. At its core, TRACE leverages Auxiliary Reasoning Sets, compact sub question answer pairs that decompose complex problems, evaluate intermediate steps through consistency-based metrics, and expose failures overlooked by standard evaluation. Our experiments show that consistency across ARS correlates with final-answer correctness and helps pinpoint the reasoning steps where failures arise, offering actionable signals for model improvement. Furthermore, TRACE defines confidence regions that distinguish reliable from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
