Unified Data Selection for LLM Reasoning
Xiaoyuan Li, Yubo Ma, Chengpeng Li, Fengbin Zhu, Yiyao Yu, Keqin Bao, Wenjie Wang, Fuli Feng, Dayiheng Liu

TL;DR
The paper introduces High-Entropy Sum (HES), a training-free metric for selecting high-quality reasoning data to improve LLM training efficiency and effectiveness across various paradigms.
Contribution
HES provides a novel, computationally efficient way to distinguish high- from low-quality reasoning samples, enhancing LLM training without additional computational costs.
Findings
HES-ranked data improves SFT performance to match full dataset training.
HES-based approach outperforms baseline methods in RFT.
HES-selected trajectories lead to superior reasoning in RL.
Abstract
Effectively training Large Language Models (LLMs) for complex, long-CoT reasoning is often bottlenecked by the need for massive high-quality reasoning data. Existing methods are either computationally expensive or fail to reliably distinguish high- from low-quality reasoning samples. To address this, we propose High-Entropy Sum (HES), a training-free metric that quantifies reasoning quality by summing only the entropy of the top (e.g., 0.5\%) highest-entropy tokens in each reasoning sample. We validate HES across three mainstream training paradigms: Supervised Fine-tuning (SFT), Rejection Fine-tuning (RFT), and Reinforcement Learning (RL), with extensive results demonstrating its consistent effectiveness and significantly reduced computational overhead. In SFT, training on the top 20\% HES-ranked data matches full-dataset performance, while using the lowest-HES data degrades it. In RFT,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
