Actor-Curator: Co-adaptive Curriculum Learning via Policy-Improvement Bandits for RL Post-Training

Zhengyao Gu; Jonathan Light; Raul Astudillo; Ziyu Ye; Langzhou He; Henry Peng Zou; Wei Cheng; Santiago Paternain; Philip S. Yu; Yisong Yue

arXiv:2602.20532·cs.LG·February 25, 2026

Actor-Curator: Co-adaptive Curriculum Learning via Policy-Improvement Bandits for RL Post-Training

Zhengyao Gu, Jonathan Light, Raul Astudillo, Ziyu Ye, Langzhou He, Henry Peng Zou, Wei Cheng, Santiago Paternain, Philip S. Yu, Yisong Yue

PDF

Open Access

TL;DR

ACTOR-CURATOR introduces a scalable, automated curriculum learning framework for reinforcement learning post-training of large language models, optimizing problem selection via bandit algorithms to improve performance and training efficiency.

Contribution

It formulates problem selection as a non-stationary bandit problem and derives a loss function with regret guarantees, advancing curriculum learning methods for large language models.

Findings

01

Outperforms uniform sampling and baseline curricula on reasoning benchmarks.

02

Achieves up to 80% training speedup and significant performance gains.

03

Demonstrates improved training stability and efficiency.

Abstract

Post-training large foundation models with reinforcement learning typically relies on massive and heterogeneous datasets, making effective curriculum learning both critical and challenging. In this work, we propose ACTOR-CURATOR, a scalable and fully automated curriculum learning framework for reinforcement learning post-training of large language models (LLMs). ACTOR-CURATOR learns a neural curator that dynamically selects training problems from large problem banks by directly optimizing for expected policy performance improvement. We formulate problem selection as a non-stationary stochastic bandit problem, derive a principled loss function based on online stochastic mirror descent, and establish regret guarantees under partial feedback. Empirically, ACTOR-CURATOR consistently outperforms uniform sampling and strong curriculum baselines across a wide range of challenging reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications