Dynamic Sampling that Adapts: Self-Aware Iterative Data Persistent Optimization for Mathematical Reasoning

Jun Rao; Xuebo Liu; Hexuan Deng; Zepeng Lin; Zixiong Yu; Jiansheng Wei; Xiaojun Meng; Min Zhang

arXiv:2505.16176·cs.AI·April 20, 2026

Dynamic Sampling that Adapts: Self-Aware Iterative Data Persistent Optimization for Mathematical Reasoning

Jun Rao, Xuebo Liu, Hexuan Deng, Zepeng Lin, Zixiong Yu, Jiansheng Wei, Xiaojun Meng, Min Zhang

PDF

TL;DR

This paper introduces SAI-DPO, a dynamic data sampling framework for mathematical reasoning that adapts training data based on the model's evolving capabilities, improving efficiency and performance.

Contribution

SAI-DPO is the first dynamic sampling method that uses real-time feedback to align training data with the model's current competence in mathematical reasoning.

Findings

01

SAI-DPO outperforms static data sampling baselines by nearly 6 points on multiple benchmarks.

02

It achieves state-of-the-art efficiency with significantly less data.

03

Experiments on eight benchmarks demonstrate its effectiveness.

Abstract

In mathematical reasoning, data selection strategies predominantly rely on static, externally defined metrics, which fail to adapt to the evolving capabilities of models during training. This misalignment limits the efficiency of Supervised Fine-Tuning and Reinforcement Learning. To bridge this gap, we introduce SAI-DPO (Self-Aware Iterative Data Persistent Optimization), a dynamic sampling framework that aligns training data with the model's intrinsic competence. SAI-DPO operationalizes two novel metrics: Knowledge Semantic Alignment for targeting domain weaknesses, and Self-Aware Difficulty, derived from pass rates and reasoning path characteristics, to gauge instance complexity relative to the model's current state. By iteratively recalibrating the data distribution based on real-time feedback, SAI-DPO dynamically aligns training samples with the model's evolving competence, ensuring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.