Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement

Xuechen Zhang; Zijian Huang; Chenshun Ni; Ziyang Xiong; Jiasi Chen; Samet Oymak

arXiv:2505.07961·cs.LG·May 26, 2025

Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement

Xuechen Zhang, Zijian Huang, Chenshun Ni, Ziyang Xiong, Jiasi Chen, Samet Oymak

PDF

Open Access

TL;DR

This paper introduces algorithms to improve token-efficient reasoning in small language models by controlling stopping points and trace length, significantly reducing computation without sacrificing accuracy.

Contribution

It proposes temperature scaling and TLDR reinforcement learning methods to optimize reasoning trace length and stopping points in small models, enhancing efficiency and flexibility.

Findings

01

TLDR reduces token usage by about 50%.

02

Temperature scaling effectively controls reasoning length.

03

Methods outperform baseline approaches in benchmarks.

Abstract

Recent research enhances language model reasoning by scaling test-time compute via longer chain-of-thought traces. This often improves accuracy but also introduces redundancy and high computational cost, especially for small language models distilled with supervised fine-tuning (SFT). In this work, we propose new algorithms to improve token-efficient reasoning with small-scale models by effectively trading off accuracy and computation. We first show that the post-SFT model fails to determine the optimal stopping point of the reasoning process, resulting in verbose and repetitive outputs. Verbosity also significantly varies across wrong vs correct responses. To address these issues, we propose two solutions: (1) Temperature scaling (TS) to control the stopping point for the thinking phase and thereby trace length, and (2) TLDR: a length-regularized reinforcement learning method based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsShrink and Fine-Tune · Spatio-temporal stability analysis