ReMiT: RL-Guided Mid-Training for Iterative LLM Evolution

Junjie Huang; Jiarui Qin; Di Yin; Weiwen Liu; Yong Yu; Xing Sun; Weinan Zhang

arXiv:2602.03075·cs.CL·February 4, 2026

ReMiT: RL-Guided Mid-Training for Iterative LLM Evolution

Junjie Huang, Jiarui Qin, Di Yin, Weiwen Liu, Yong Yu, Xing Sun, Weinan Zhang

PDF

Open Access

TL;DR

ReMiT introduces a novel RL-guided mid-training approach that dynamically reweights tokens to enhance reasoning capabilities in large language models, establishing a self-reinforcing cycle for iterative model improvement.

Contribution

The paper proposes ReMiT, a method leveraging reinforcement learning during mid-training to improve reasoning in LLMs without needing external teachers or reference models.

Findings

01

Achieves an average of 3% improvement on 10 benchmarks

02

Sustains over 2% gains through post-training

03

Validates iterative, self-reinforcing feedback loop for LLM evolution

Abstract

Standard training pipelines for large language models (LLMs) are typically unidirectional, progressing from pre-training to post-training. However, the potential for a bidirectional process--where insights from post-training retroactively improve the pre-trained foundation--remains unexplored. We aim to establish a self-reinforcing flywheel: a cycle in which reinforcement learning (RL)-tuned model strengthens the base model, which in turn enhances subsequent post-training performance, requiring no specially trained teacher or reference model. To realize this, we analyze training dynamics and identify the mid-training (annealing) phase as a critical turning point for model capabilities. This phase typically occurs at the end of pre-training, utilizing high-quality corpora under a rapidly decaying learning rate. Building upon this insight, we introduce ReMiT (Reinforcement Learning-Guided…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques