On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
Charlie Zhang, Graham Neubig, Xiang Yue

TL;DR
This paper develops a controlled experimental framework to disentangle the effects of pre-training, mid-training, and reinforcement learning on reasoning capabilities in language models, revealing their interplay and conditions for improvement.
Contribution
It introduces a systematic method to isolate and analyze the causal impacts of different training stages and RL on reasoning in language models.
Findings
RL improves reasoning only when pre-training leaves sufficient headroom.
Contextual generalization needs minimal pre-training exposure for effective transfer.
Mid-training significantly boosts performance compared to RL alone.
Abstract
Recent reinforcement learning (RL) techniques have yielded impressive reasoning improvements in language models, yet it remains unclear whether post-training truly extends a model's reasoning ability beyond what it acquires during pre-training. A central challenge is the lack of control in modern training pipelines: large-scale pre-training corpora are opaque, mid-training is often underexamined, and RL objectives interact with unknown prior knowledge in complex ways. To resolve this ambiguity, we develop a fully controlled experimental framework that isolates the causal contributions of pre-training, mid-training, and RL-based post-training. Our approach employs synthetic reasoning tasks with explicit atomic operations, parseable step-by-step reasoning traces, and systematic manipulation of training distributions. We evaluate models along two axes: extrapolative generalization to more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
