GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training
Junjie Li, Ziao Wang, NingXuan Ma, Jianghong Ma, Xiaofeng Zhang

TL;DR
GRACE is a novel data curation method that scores individual reasoning steps based on gradient alignment and trajectory consistency, enabling efficient subset selection for training language models.
Contribution
It introduces a scalable, step-level scoring technique using internal model signals without external annotations, improving data efficiency in reasoning tasks.
Findings
GRACE achieves 108.8% of full-data performance with only 20% of the data.
It retains 100.2% performance with just 5% of the data.
Subsets selected by GRACE transfer effectively across different model backbones.
Abstract
Existing reasoning data curation pipelines score whole samples, treating every intermediate step as equally valuable. In reality, steps within a trace contribute very unevenly, and selecting reasoning data well requires assessing them individually. We present GRACE, a gradient-aligned curation method that views each reasoning trace as a sequence of optimization events and scores every step by two complementary signals: its alignment with the answer-oriented gradient direction, and its consistency with the preceding reasoning trajectory. Step-level scores are aggregated into a sample-level value for subset selection, using only the model's internal optimization signals and no external reward models or step annotations. To make this scalable, GRACE introduces a representation-level gradient proxy that estimates step-level alignment from token-level upstream signals in a single forward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
