HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation
Feng Xiong, Hongling Xu, Yifei Wang, Runxi Cheng, Yong Wang, Xiangxiang Chu

TL;DR
This paper introduces HS-STaR, a hierarchical sampling method that improves self-taught reasoning in large language models by focusing on problems near the reasoning boundary, leading to better training data efficiency and performance.
Contribution
The paper proposes a novel hierarchical sampling framework that dynamically reallocates sampling budget based on difficulty estimation, enhancing self-taught reasoning in LLMs.
Findings
HS-STaR outperforms baseline methods across multiple benchmarks.
Focusing on boundary-level problems yields higher learning utility.
Dynamic budget reallocation improves training efficiency.
Abstract
Self-taught reasoners (STaRs) enhance the mathematical reasoning abilities of large language models (LLMs) by leveraging self-generated responses for self-training. Recent studies have incorporated reward models to guide response selection or decoding, aiming to obtain higher-quality data. However, they typically allocate a uniform sampling budget across all problems, overlooking the varying utility of problems at different difficulty levels. In this work, we conduct an empirical study and find that problems near the boundary of the LLM's reasoning capability offer significantly greater learning utility than both easy and overly difficult ones. To identify and exploit such problems, we propose HS-STaR, a Hierarchical Sampling framework for Self-Taught Reasoners. Given a fixed sampling budget, HS-STaR first performs lightweight pre-sampling with a reward-guided difficulty estimation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsForecasting Techniques and Applications · Bayesian Modeling and Causal Inference · Statistical and Computational Modeling
