A Model Can Help Itself: Reward-Free Self-Training for LLM Reasoning

Mengqi Li; Lei Zhao; Anthony Man-Cho So; Ruoyu Sun; Xiao Li

arXiv:2510.18814·cs.LG·May 18, 2026

A Model Can Help Itself: Reward-Free Self-Training for LLM Reasoning

Mengqi Li, Lei Zhao, Anthony Man-Cho So, Ruoyu Sun, Xiao Li

PDF

2 Repos

TL;DR

This paper introduces SePT, a self-training method enabling language models to enhance their reasoning abilities solely through self-generated responses without external rewards, demonstrated across multiple math benchmarks.

Contribution

The paper presents SePT, a novel self-evolving post-training approach that improves reasoning performance using only self-sampled data and online data refresh mechanisms.

Findings

01

SePT improves reasoning performance on six math benchmarks.

02

Online data refresh and temperature dynamics are crucial for success.

03

Self-training alone can significantly enhance model reasoning without external rewards.

Abstract

Can language models improve their reasoning performance without external rewards, using only their own sampled responses for training? We show that they can. We propose Self-evolving Post-Training (SePT), a simple post-training method that alternates between self-generation and training on self-generated responses. It repeatedly samples questions, uses the model itself to generate responses under a specified sampling temperature, and then trains the model on the self-generated data. In this self-training loop, we use an online data refresh mechanism, where each new batch is generated by the most recently updated model. Across six math reasoning benchmarks, SePT improves a strong no-training baseline, defined as the untuned base model evaluated at its best swept decoding temperature, on several tested models. Additional ablations demonstrate the importance of online data refresh and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.