Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training

Dake Bu; Wei Huang; Andi Han; Atsushi Nitanda; Hau-San Wong; Qingfu Zhang; Taiji Suzuki

arXiv:2511.07372·cs.LG·May 5, 2026

Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training

Dake Bu, Wei Huang, Andi Han, Atsushi Nitanda, Hau-San Wong, Qingfu Zhang, Taiji Suzuki

PDF

1 Repo

TL;DR

This paper provides a theoretical framework demonstrating that curriculum strategies during post-training significantly reduce sample complexity for reasoning tasks in large language models, supported by empirical simulations.

Contribution

It introduces a formal analysis showing exponential improvements in sample complexity with curriculum methods and provides a new reasoning tree model for understanding these effects.

Findings

01

Curriculum strategies achieve polynomial sample complexity, unlike non-curriculum methods with exponential complexity.

02

Reinforcement learning finetuning with curricula improves reasoning accuracy.

03

Empirical simulations support the theoretical guarantees.

Abstract

Recent curriculum techniques in the post-training stage of LLMs have been empirically observed to outperform non-curriculum approaches in improving reasoning performance, yet a principled understanding of their effectiveness and limitations remains incomplete. To bridge this gap, we develop an abstract theoretical framework and identify sufficient conditions under which curriculum post-training yields exponential improvements in sample complexity. To substantiate this framework, we model the base model's Chain-of-Thought generation as a state-conditioned autoregressive reasoning tree, and formalize curriculum subtasks as either depth-increasing curricula that progressively extend reasoning horizons or hint-decreasing curricula that gradually remove partial hints. Our analysis shows that reinforcement learning finetuning with both curriculum strategies achieves high accuracy with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DakeBU/Curriculum-Post-training
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.