Reinforcement Pre-Training

Qingxiu Dong; Li Dong; Yao Tang; Tianzhu Ye; Yutao Sun; Zhifang Sui; Furu Wei

arXiv:2506.08007·cs.CL·June 10, 2025

Reinforcement Pre-Training

Qingxiu Dong, Li Dong, Yao Tang, Tianzhu Ye, Yutao Sun, Zhifang Sui, Furu Wei

PDF

Open Access 1 Models

TL;DR

Reinforcement Pre-Training (RPT) introduces a scalable RL-based approach to enhance language models by framing next-token prediction as a reasoning task, leading to improved accuracy and a strong foundation for further RL fine-tuning.

Contribution

RPT presents a novel scaling paradigm that leverages RL for language model pre-training, emphasizing reasoning and reward-based training on large text datasets.

Findings

01

RPT significantly improves next-token prediction accuracy.

02

Scaling compute consistently enhances model performance.

03

RPT provides a robust foundation for reinforcement fine-tuning.

Abstract

In this work, we introduce Reinforcement Pre-Training (RPT) as a new scaling paradigm for large language models and reinforcement learning (RL). Specifically, we reframe next-token prediction as a reasoning task trained using RL, where it receives verifiable rewards for correctly predicting the next token for a given context. RPT offers a scalable method to leverage vast amounts of text data for general-purpose RL, rather than relying on domain-specific annotated answers. By incentivizing the capability of next-token reasoning, RPT significantly improves the language modeling accuracy of predicting the next tokens. Moreover, RPT provides a strong pre-trained foundation for further reinforcement fine-tuning. The scaling curves show that increased training compute consistently improves the next-token prediction accuracy. The results position RPT as an effective and promising scaling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
ykarout/RPT-DeepSeek-R1-0528-Qwen3-8B
model· 9 dl· ♡ 2
9 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification