GradAlign: Gradient-Aligned Data Selection for LLM Reinforcement Learning

Ningyuan Yang; Weihua Du; Weiwei Sun; Sean Welleck; Yiming Yang

arXiv:2602.21492·cs.LG·February 26, 2026

GradAlign: Gradient-Aligned Data Selection for LLM Reinforcement Learning

Ningyuan Yang, Weihua Du, Weiwei Sun, Sean Welleck, Yiming Yang

PDF

Open Access

TL;DR

GradAlign introduces a gradient-aligned data selection method for LLM reinforcement learning, improving training stability and performance by prioritizing data that aligns with validation gradients, especially in challenging data regimes.

Contribution

It proposes a novel gradient alignment technique for data selection in RL, addressing issues of non-stationarity and unreliable rewards in LLM training.

Findings

01

GradAlign outperforms existing baselines in various challenging data regimes.

02

Using a small trusted validation set improves data selection quality.

03

Gradient alignment leads to more stable training and better final performance.

Abstract

Reinforcement learning (RL) has become a central post-training paradigm for large language models (LLMs), but its performance is highly sensitive to the quality of training problems. This sensitivity stems from the non-stationarity of RL: rollouts are generated by an evolving policy, and learning is shaped by exploration and reward feedback, unlike supervised fine-tuning (SFT) with fixed trajectories. As a result, prior work often relies on manual curation or simple heuristic filters (e.g., accuracy), which can admit incorrect or low-utility problems. We propose GradAlign, a gradient-aligned data selection method for LLM reinforcement learning that uses a small, trusted validation set to prioritize training problems whose policy gradients align with validation gradients, yielding an adaptive curriculum. We evaluate GradAlign across three challenging data regimes: unreliable reward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education