Unified Data Selection for LLM Reasoning

Xiaoyuan Li; Yubo Ma; Chengpeng Li; Fengbin Zhu; Yiyao Yu; Keqin Bao; Wenjie Wang; Fuli Feng; Dayiheng Liu

arXiv:2605.22389·cs.CL·May 22, 2026

Unified Data Selection for LLM Reasoning

Xiaoyuan Li, Yubo Ma, Chengpeng Li, Fengbin Zhu, Yiyao Yu, Keqin Bao, Wenjie Wang, Fuli Feng, Dayiheng Liu

PDF

TL;DR

The paper introduces High-Entropy Sum (HES), a training-free metric for selecting high-quality reasoning data to improve LLM training efficiency and effectiveness across various paradigms.

Contribution

HES provides a novel, computationally efficient way to distinguish high- from low-quality reasoning samples, enhancing LLM training without additional computational costs.

Findings

01

HES-ranked data improves SFT performance to match full dataset training.

02

HES-based approach outperforms baseline methods in RFT.

03

HES-selected trajectories lead to superior reasoning in RL.

Abstract

Effectively training Large Language Models (LLMs) for complex, long-CoT reasoning is often bottlenecked by the need for massive high-quality reasoning data. Existing methods are either computationally expensive or fail to reliably distinguish high- from low-quality reasoning samples. To address this, we propose High-Entropy Sum (HES), a training-free metric that quantifies reasoning quality by summing only the entropy of the top (e.g., 0.5\%) highest-entropy tokens in each reasoning sample. We validate HES across three mainstream training paradigms: Supervised Fine-tuning (SFT), Rejection Fine-tuning (RFT), and Reinforcement Learning (RL), with extensive results demonstrating its consistent effectiveness and significantly reduced computational overhead. In SFT, training on the top 20\% HES-ranked data matches full-dataset performance, while using the lowest-HES data degrades it. In RFT,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.