AdaSTaR: Adaptive Data Sampling for Training Self-Taught Reasoners

Woosung Koh; Wonbeen Oh; Jaein Jang; MinHyung Lee; Hyeongjin Kim; Ah Yeon Kim; Joonkee Kim; Junghyun Lee; Taehyeon Kim; Se-Young Yun

arXiv:2505.16322·cs.LG·October 7, 2025

AdaSTaR: Adaptive Data Sampling for Training Self-Taught Reasoners

Woosung Koh, Wonbeen Oh, Jaein Jang, MinHyung Lee, Hyeongjin Kim, Ah Yeon Kim, Joonkee Kim, Junghyun Lee, Taehyeon Kim, Se-Young Yun

PDF

Open Access

TL;DR

AdaSTaR is a novel adaptive sampling algorithm that improves the training efficiency and accuracy of self-improving reasoning language models by balancing data diversity and difficulty.

Contribution

Introduces AdaSTaR, an adaptive sampling method that enhances self-taught reasoning models by balancing data diversity and difficulty dynamically during training.

Findings

01

Achieves best test accuracy in all six benchmarks.

02

Reduces training FLOPs by an average of 58.6%.

03

Generalizes across different pre-trained LMs and larger models.

Abstract

Self-Taught Reasoners (STaR), synonymously known as Rejection sampling Fine-Tuning (RFT), is an integral part of the training pipeline of self-improving reasoning Language Models (LMs). The self-improving mechanism often employs random observation (data) sampling. However, this results in trained observation imbalance; inefficiently over-training on solved examples while under-training on challenging ones. In response, we introduce Adaptive STaR (AdaSTaR), a novel algorithm that rectifies this by integrating two adaptive sampling principles: (1) Adaptive Sampling for Diversity: promoting balanced training across observations, and (2) Adaptive Sampling for Curriculum: dynamically adjusting data difficulty to match the model's evolving strength. Across six benchmarks, AdaSTaR achieves best test accuracy in all instances (6/6) and reduces training FLOPs by an average of 58.6% against an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Intelligent Tutoring Systems and Adaptive Learning