RPTS: Tree-Structured Reasoning Process Scoring for Faithful Multimodal Evaluation

Haofeng Wang; Yu Zhang

arXiv:2511.06899·cs.CL·February 26, 2026

RPTS: Tree-Structured Reasoning Process Scoring for Faithful Multimodal Evaluation

Haofeng Wang, Yu Zhang

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces RPTS, a tree-structured metric for evaluating the reasoning process in multimodal models, and presents RPTS-Eval, a benchmark to assess and analyze their reasoning capabilities.

Contribution

The paper proposes a novel tree-based reasoning process scoring method and constructs a new benchmark for more faithful evaluation of multimodal reasoning in LVLMs.

Findings

01

LVLMs show limitations in multimodal reasoning accuracy.

02

RPTS effectively pinpoints reasoning failures and strengths.

03

Benchmark reveals differences between open-source and commercial LVLMs.

Abstract

Large Vision-Language Models (LVLMs) excel in multimodal reasoning and have shown impressive performance on various multimodal benchmarks. However, most of these benchmarks evaluate models primarily through multiple-choice or short-answer formats, which do not take the reasoning process into account. Although some benchmarks assess the reasoning process, their methods are often overly simplistic and only examine reasoning when answers are incorrect. This approach overlooks scenarios where flawed reasoning leads to correct answers. In addition, these benchmarks do not consider the impact of intermodal relationships on reasoning. To address this issue, we propose the Reasoning Process Tree Score (RPTS), a tree structure-based metric to assess reasoning processes. Specifically, we organize the reasoning steps into a reasoning tree and leverage its hierarchical information to assign…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

nimingshuaishi/RPTS-Eval
dataset· 30 dl
30 dl

Videos

RPTS: Tree-Structured Reasoning Process Scoring for Faithful Multimodal Evaluation· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Topic Modeling