LearnAlign: Data Selection for LLM Reinforcement Learning with Improved Gradient Alignment

Shipeng Li; Zhiqin Yang; Shikun Li; Xiaobo Xia; Hengyu Liu; Xinghua Zhang; Gaode Chen; Dong Fang; Ying Tai; Zhe Peng

arXiv:2506.11480·cs.LG·April 28, 2026

LearnAlign: Data Selection for LLM Reinforcement Learning with Improved Gradient Alignment

Shipeng Li, Zhiqin Yang, Shikun Li, Xiaobo Xia, Hengyu Liu, Xinghua Zhang, Gaode Chen, Dong Fang, Ying Tai, Zhe Peng

PDF

TL;DR

LearnAlign is a gradient-alignment-based data selection method that improves training efficiency for RLVR in LLMs by intelligently choosing representative reasoning data, reducing data needs significantly.

Contribution

It introduces a novel data learnability measure based on success rate to address response-length bias and enhances data efficiency in RLVR training.

Findings

01

Reduces training data by up to 1,000 data points on GSM8K.

02

Achieves 77.5% accuracy with less data, outperforming full dataset training.

03

Demonstrates efficiency on mathematical and code benchmarks.

Abstract

Reinforcement learning with verifiable rewards (RLVR) has become a key technique for enhancing LLMs' reasoning abilities, yet its data inefficiency remains a major bottleneck. To address this critical yet challenging issue, we present a novel gradient-alignment-based method, named LearnAlign, which intelligently selects the learnable and representative training reasoning data for RLVR post-training. To overcome the well-known response-length bias in gradient norms, we introduce the data learnability based on the success rate, which indicates the learning potential of each data point. Experiments across five reasoning benchmarks show that our method significantly reduces training data requirements while achieving minor performance degradation or even improving performance compared to full-data training. Specifically, it reduces data requirements by up to 1,000 data points with better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.