Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings

Safal Shrestha; Minwu Kim; Aadim Nepal; Anubhav Shrestha; Keith Ross

arXiv:2505.13718·cs.AI·February 2, 2026

Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings

Safal Shrestha, Minwu Kim, Aadim Nepal, Anubhav Shrestha, Keith Ross

PDF

1 Repo 1 Models 1 Datasets

TL;DR

This paper introduces a two-stage, sample-efficient training method for reasoning large language models, combining a warmup phase with distillation from logic puzzles and limited supervised RLVR to enhance reasoning and generalization in data-scarce settings.

Contribution

The paper proposes a novel warmup strategy using logic puzzles for general reasoning, improving performance and sample efficiency in limited-data scenarios.

Findings

01

Warmup alone improves reasoning across multiple tasks.

02

Warmed-up models outperform base models with limited RLVR data.

03

Warmup maintains cross-domain generalization after domain-specific training.

Abstract

Designing effective reasoning-capable LLMs typically requires training using Reinforcement Learning with Verifiable Rewards (RLVR) or distillation with carefully curated Long Chain of Thoughts (CoT), both of which depend heavily on extensive training data. This creates a major challenge when the amount of quality training data is scarce. We propose a sample-efficient, two-stage training strategy to develop reasoning LLMs under limited supervision. In the first stage, we "warm up" the model by distilling Long CoTs from a toy domain, namely, Knights \& Knaves (K\&K) logic puzzles to acquire general reasoning skills. In the second stage, we apply RLVR to the warmed-up model using a limited set of target-domain examples. Our experiments demonstrate that this two-phase approach offers several benefits: $(i)$ the warmup phase alone facilitates generalized reasoning, leading to performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

safal312/warmup-before-you-train
pytorchOfficial

Models

🤗
safal312/qwen2.5-3b-kk-distilled
model· 1 dl
1 dl

Datasets

safal312/knights_and_knaves_reasoning
dataset· 22 dl
22 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsBalanced Selection · Sparse Evolutionary Training