PRISM: Demystifying Retention and Interaction in Mid-Training
Bharat Runwal, Ashish Agrawal, Anurag Roy, Rameswar Panda

TL;DR
This paper introduces PRISM, an empirical study demonstrating that mid-training on high-quality data significantly improves large language models' reasoning abilities, with detailed insights into training, data, and reinforcement learning effects.
Contribution
PRISM provides a comprehensive analysis of mid-training design choices, showing its effectiveness for reasoning tasks and offering practical guidance for robust model training pipelines.
Findings
Mid-training on 27B tokens yields significant performance gains.
Full PRISM to RL pipeline greatly improves reasoning benchmarks.
Data composition during mid-training is more impactful than RL adjustments.
Abstract
We present PRISM, a comprehensive empirical study of mid-training design choices for large language models. Through controlled experiments across seven base models spanning four families (Granite, LLaMA, Mistral, Nemotron-H), two architecture types (dense Transformer and attention-Mamba hybrid), and scales from 3B to 24B parameters, we show that mid-training on approximately 27B high-quality tokens yields consistent gains of +15 to +40 points on math, +5 to +12 points on code, and +6 to +13 points on science benchmarks while preserving general performance. The full PRISM to RL pipeline improves macro-average across six reasoning benchmarks from under 12 to 29-42 (a 3-4x improvement), whereas RL applied directly to most of the base models remains substantially less effective, with AIME scores near zero. Data composition matters most at mid-training, not RL: including science data during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Materials Science
