Scheduling Your LLM Reinforcement Learning with Reasoning Trees

Hong Wang; Zhezheng Hao; Jian Luo; Chenxing Wei; Yao Shu; Lei Liu; Qiang Lin; Hande Dong; Jiawei Chen

arXiv:2510.24832·cs.AI·April 28, 2026

Scheduling Your LLM Reinforcement Learning with Reasoning Trees

Hong Wang, Zhezheng Hao, Jian Luo, Chenxing Wei, Yao Shu, Lei Liu, Qiang Lin, Hande Dong, Jiawei Chen

PDF

1 Video

TL;DR

This paper introduces a new reasoning tree-based metric and scheduling algorithm for reinforcement learning with verifiable rewards to improve large language model performance on math reasoning tasks.

Contribution

It proposes the Reasoning Score (r-score) and Re-Schedule algorithm, leveraging reasoning tree structures for more effective data scheduling in RLVR.

Findings

01

Re-Schedule improves average accuracy by up to 3.2% on six benchmarks.

02

Structural reasoning tree understanding enhances RLVR data scheduling.

03

The approach demonstrates significant gains over path-based metrics.

Abstract

Using Reinforcement Learning with Verifiable Rewards (RLVR) to optimize Large Language Models (LLMs) can be conceptualized as progressively editing a query's `Reasoning Tree'. This process involves exploring nodes (tokens) and dynamically modifying the model's policy at each node. When combined with data scheduling, this process yields further gains in data efficiency and accuracy. However, existing RLVR data scheduling methods typically rely on path-based metrics to rank queries, overlooking the reasoning tree structures of these queries. In this paper, we introduce a novel metric, namely Reasoning Score (r-score), which measures the query's learning difficulty based on the structure of its reasoning tree. Based on the r-score, we propose the Reasoning Tree Schedule (Re-Schedule), a scheduling algorithm that constructs a curriculum progressing from structurally simple (high r-score) to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Scheduling Your LLM Reinforcement Learning with Reasoning Trees· slideslive