Characterizing, Evaluating, and Optimizing Complex Reasoning
Haoran Zhang, Yafu Li, Zhi Wang, Zhilin Wang, Shunkai Zhang, Xiaoye Qu, Yu Cheng

TL;DR
This paper introduces a unified framework for defining, evaluating, and optimizing complex reasoning in large models, using DAG-based evaluation and a learned reward model to improve reasoning quality and task performance.
Contribution
It proposes the ME$^2$ principle for reasoning quality, models reasoning traces as DAGs, and develops a TRM-Preference dataset and Thinking Reward Model for scalable evaluation and optimization.
Findings
Thinking rewards improve reasoning outcomes by up to 19.3%.
Better reasoning selection enhances performance by up to 3.9%.
DAG-based evaluation effectively captures complex reasoning structures.
Abstract
Large Reasoning Models (LRMs) increasingly rely on reasoning traces with complex internal structures. However, existing work lacks a unified answer to three fundamental questions: (1) what defines high-quality reasoning, (2) how to reliably evaluate long, implicitly structured reasoning traces, and (3) how to use such evaluation signals for reasoning optimization. To address these challenges, we provide a unified perspective. (1) We introduce the ME principle to characterize reasoning quality along macro- and micro-level concerning efficiency and effectiveness. (2) Built on this principle, we model reasoning traces as directed acyclic graphs (DAGs) and develop a DAG-based pairwise evaluation method, capturing complex reasoning structures. (3) Based on this method, we construct the TRM-Preference dataset and train a Thinking Reward Model (TRM) to evaluate reasoning quality at scale.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks
