Select to Think: Unlocking SLM Potential with Local Sufficiency
Wenxuan Ye, Yangyang Zhang, Xueli An, Georg Carle, Yunpu Ma

TL;DR
This paper introduces S2T, a method that enables small language models to better mimic large models by selecting from their top predictions, significantly improving performance without external LLM calls.
Contribution
It proposes local sufficiency and a selection-based distillation approach, allowing SLMs to re-rank candidates autonomously, reducing inference costs.
Findings
A 1.5B SLM's top-8 candidates match 32B LLM choices with 95% accuracy.
S2T-LOCAL improves greedy decoding by 24.1% across benchmarks.
The method matches 8-path self-consistency performance with single-trajectory efficiency.
Abstract
Small language models (SLMs) offer computational efficiency for scalable deployment, yet they often fall short of the reasoning power exhibited by their larger counterparts (LLMs). To mitigate this gap, current approaches invoke an LLM to generate tokens at points of reasoning divergence, but these external calls introduce substantial latency and costs. Alternatively, standard distillation is often hindered by the capacity limitation, as SLMs struggle to accurately mimic the LLM's complex generative distribution. We address this dilemma by identifying local sufficiency: at divergence points, the LLM's preferred token consistently resides within the SLM's top-K next-token predictions, even when failing to emerge as the SLM top-1 choice. We therefore propose SELECT TO THINK (S2T), which reframes the LLM's role from open-ended generation to selection among the SLM's proposals, simplifying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
