Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards

Tianyang Han; Hengyu Shi; Junjie Hu; Xu Yang; Zhiling Wang; Junhao Su

arXiv:2605.03862·cs.AI·May 8, 2026

Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards

Tianyang Han, Hengyu Shi, Junjie Hu, Xu Yang, Zhiling Wang, Junhao Su

PDF

1 Repo

TL;DR

This paper introduces TraceLift, a training framework that improves reasoning in language models by using executor-grounded rewards to evaluate and enhance the quality and usefulness of intermediate reasoning traces.

Contribution

It proposes a novel planner-executor training approach with a reasoning reward model, and introduces a new dataset for training and evaluating reasoning quality.

Findings

01

Executor-grounded rewards outperform execution-only training.

02

The approach improves reasoning quality in math and code benchmarks.

03

The dataset enables direct learning of reasoning quality.

Abstract

Reinforcement learning with verifiable rewards has become a common way to improve explicit reasoning in large language models, but final-answer correctness alone does not reveal whether the reasoning trace is faithful, reliable, or useful to the model that consumes it. This outcome-only signal can reinforce traces that are right for the wrong reasons, overstate reasoning gains by rewarding shortcuts, and propagate flawed intermediate states in multi-step systems. To this end, we propose TraceLift, a planner-executor training framework that treats reasoning as a consumable intermediate artifact. During planner training, the planner emits tagged reasoning. A frozen executor turns this reasoning into the final artifact for verifier feedback, while an executor-grounded reward shapes the intermediate trace. This reward multiplies a rubric-based Reasoning Reward Model (RM) score by measured…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MasaiahHan/TraceLift
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.