From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization
Xinjie Chen, Minpeng Liao, Guoxin Chen, Chengxi Li, Biao Fu, Kai Fan, Xinggao Liu

TL;DR
This paper introduces LPPO, a progressive optimization framework for large language models that leverages high-quality demonstrations and dynamic sample weighting to improve reasoning capabilities efficiently.
Contribution
The paper proposes LPPO, a novel sample-centric approach combining prefix-guided sampling and learning-progress weighting to enhance LLM reasoning with limited high-quality data.
Findings
Outperforms strong baselines on mathematical reasoning benchmarks.
Achieves faster convergence and higher performance ceiling.
Effectively leverages expert demonstrations for improved reasoning.
Abstract
Reinforcement learning with verifiable rewards (RLVR) has recently advanced the reasoning capabilities of large language models (LLMs). While prior work has emphasized algorithmic design, data curation, and reward shaping, we investigate RLVR from a sample-centric perspective and introduce LPPO (Learning-Progress and Prefix-guided Optimization), a framework of progressive optimization techniques. Our work addresses a critical question: how to best leverage a small set of trusted, high-quality demonstrations, rather than simply scaling up data volume. First, motivated by how hints aid human problem-solving, we propose prefix-guided sampling, an online data augmentation method that incorporates partial solution prefixes from expert demonstrations to guide the policy, particularly for challenging instances. Second, inspired by how humans focus on important questions aligned with their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
