Effective Learning for Small Reasoning Models: An Empirical Study on 0.5B Reasoning LLMs

Xialie Zhuang; Peixian Ma; Zhikai Jia; Zane Cao; Shiwei Liu

arXiv:2506.13404·cs.AI·November 19, 2025

Effective Learning for Small Reasoning Models: An Empirical Study on 0.5B Reasoning LLMs

Xialie Zhuang, Peixian Ma, Zhikai Jia, Zane Cao, Shiwei Liu

PDF

TL;DR

This paper empirically studies how different training strategies can improve the reasoning abilities of small 0.5 billion parameter language models, aiming to make them more effective for complex tasks while maintaining efficiency.

Contribution

It systematically evaluates training methods like SFT, KD, and RL for 0.5B models, providing insights and recommendations to enhance their reasoning performance.

Findings

01

Hybrid training strategies improve reasoning accuracy.

02

Optimal training pipelines close the gap with larger models.

03

Small models can perform complex reasoning with proper training.

Abstract

The ongoing evolution of language models has led to the development of large-scale architectures that demonstrate exceptional performance across a wide range of tasks. However, these models come with significant computational and energy demands, as well as potential privacy implications. In this context, Small Reasoning Language Models (SRLMs) with approximately 0.5 billion parameters present a compelling alternative due to their remarkable computational efficiency and cost-effectiveness, particularly in resource-constrained environments. Despite these advantages, the limited capacity of 0.5 billion parameter models poses challenges in handling complex tasks such as mathematical reasoning. This research investigates various training strategies, including supervised fine-tuning (SFT), knowledge distillation (KD), and reinforcement learning (RL), as well as their hybrid implementations,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsKnowledge Distillation