Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback

Jiaye Lin; Mengdi Li; Xufeng Zhao; Wenhao Lu; Peilin Zhao; Stefan Wermter; Di Wang

arXiv:2505.20075·cs.AI·April 21, 2026

Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback

Jiaye Lin, Mengdi Li, Xufeng Zhao, Wenhao Lu, Peilin Zhao, Stefan Wermter, Di Wang

PDF

TL;DR

Curriculum-RLAIF introduces a data-centric curriculum approach to improve reward model generalizability in reinforcement learning from AI feedback, leading to better policy alignment without extra inference costs.

Contribution

It proposes a novel curriculum framework that constructs difficulty-based preference pairs to enhance reward model training and generalizability.

Findings

01

Reward models trained with Curriculum-RLAIF outperform non-curriculum baselines.

02

The approach improves policy alignment performance significantly.

03

Curriculum-RLAIF is simpler, more efficient, and more effective than alternative strategies.

Abstract

Reward models trained through Reinforcement Learning from AI Feedback (RLAIF) methods frequently suffer from limited generalizability, which hinders the alignment performance of policy models. This challenge stems from various issues, including distribution shift, preference label noise, and mismatch of overly challenging samples with model capacity. In this paper, we aim to enhance the generalizability of reward models through a data-centric approach, driven by the insight that these issues are inherently intertwined from a uniform perspective of data difficulty. Accordingly, we propose a novel framework, Curriculum-RLAIF, which constructs preference pairs with varying difficulty levels and then produces a specific curriculum for reward model training. Comprehensive experimental results suggest that reward models trained with Curriculum-RLAIF achieve improved generalizability, boosting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.