ReMiT: RL-Guided Mid-Training for Iterative LLM Evolution
Junjie Huang, Jiarui Qin, Di Yin, Weiwen Liu, Yong Yu, Xing Sun, Weinan Zhang

TL;DR
ReMiT introduces a novel RL-guided mid-training approach that dynamically reweights tokens to enhance reasoning capabilities in large language models, establishing a self-reinforcing cycle for iterative model improvement.
Contribution
The paper proposes ReMiT, a method leveraging reinforcement learning during mid-training to improve reasoning in LLMs without needing external teachers or reference models.
Findings
Achieves an average of 3% improvement on 10 benchmarks
Sustains over 2% gains through post-training
Validates iterative, self-reinforcing feedback loop for LLM evolution
Abstract
Standard training pipelines for large language models (LLMs) are typically unidirectional, progressing from pre-training to post-training. However, the potential for a bidirectional process--where insights from post-training retroactively improve the pre-trained foundation--remains unexplored. We aim to establish a self-reinforcing flywheel: a cycle in which reinforcement learning (RL)-tuned model strengthens the base model, which in turn enhances subsequent post-training performance, requiring no specially trained teacher or reference model. To realize this, we analyze training dynamics and identify the mid-training (annealing) phase as a critical turning point for model capabilities. This phase typically occurs at the end of pre-training, utilizing high-quality corpora under a rapidly decaying learning rate. Building upon this insight, we introduce ReMiT (Reinforcement Learning-Guided…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
