Effective Learning for Small Reasoning Models: An Empirical Study on 0.5B Reasoning LLMs
Xialie Zhuang, Peixian Ma, Zhikai Jia, Zane Cao, Shiwei Liu

TL;DR
This paper empirically studies how different training strategies can improve the reasoning abilities of small 0.5 billion parameter language models, aiming to make them more effective for complex tasks while maintaining efficiency.
Contribution
It systematically evaluates training methods like SFT, KD, and RL for 0.5B models, providing insights and recommendations to enhance their reasoning performance.
Findings
Hybrid training strategies improve reasoning accuracy.
Optimal training pipelines close the gap with larger models.
Small models can perform complex reasoning with proper training.
Abstract
The ongoing evolution of language models has led to the development of large-scale architectures that demonstrate exceptional performance across a wide range of tasks. However, these models come with significant computational and energy demands, as well as potential privacy implications. In this context, Small Reasoning Language Models (SRLMs) with approximately 0.5 billion parameters present a compelling alternative due to their remarkable computational efficiency and cost-effectiveness, particularly in resource-constrained environments. Despite these advantages, the limited capacity of 0.5 billion parameter models poses challenges in handling complex tasks such as mathematical reasoning. This research investigates various training strategies, including supervised fine-tuning (SFT), knowledge distillation (KD), and reinforcement learning (RL), as well as their hybrid implementations,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsKnowledge Distillation
