HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation

Feng Xiong; Hongling Xu; Yifei Wang; Runxi Cheng; Yong Wang; Xiangxiang Chu

arXiv:2505.19866·cs.AI·September 30, 2025

HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation

Feng Xiong, Hongling Xu, Yifei Wang, Runxi Cheng, Yong Wang, Xiangxiang Chu

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces HS-STaR, a hierarchical sampling method that improves self-taught reasoning in large language models by focusing on problems near the reasoning boundary, leading to better training data efficiency and performance.

Contribution

The paper proposes a novel hierarchical sampling framework that dynamically reallocates sampling budget based on difficulty estimation, enhancing self-taught reasoning in LLMs.

Findings

01

HS-STaR outperforms baseline methods across multiple benchmarks.

02

Focusing on boundary-level problems yields higher learning utility.

03

Dynamic budget reallocation improves training efficiency.

Abstract

Self-taught reasoners (STaRs) enhance the mathematical reasoning abilities of large language models (LLMs) by leveraging self-generated responses for self-training. Recent studies have incorporated reward models to guide response selection or decoding, aiming to obtain higher-quality data. However, they typically allocate a uniform sampling budget across all problems, overlooking the varying utility of problems at different difficulty levels. In this work, we conduct an empirical study and find that problems near the boundary of the LLM's reasoning capability offer significantly greater learning utility than both easy and overly difficult ones. To identify and exploit such problems, we propose HS-STaR, a Hierarchical Sampling framework for Self-Taught Reasoners. Given a fixed sampling budget, HS-STaR first performs lightweight pre-sampling with a reward-guided difficulty estimation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Jing-Xun/HS-STaR
dataset· 27 dl
27 dl

Videos

HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation· underline

Taxonomy

TopicsForecasting Techniques and Applications · Bayesian Modeling and Causal Inference · Statistical and Computational Modeling