Star Elastic: Many-in-One Reasoning LLMs with Efficient Budget Control
Ali Taghibakhshi, Ruisi Cai, Saurav Muralidharan, Sharath Turuvekere Sreenivas, Aditya Vavre, Ameya Sunil Mahabaleshwarkar, Bilal Kartal, Sheldon Liang, Marcin Chochowski, Zijia Chen, Akhiad Bercovich, Ran Zilberstein, Ran El-Yaniv, Yonatan Geifman, Daniel Korzekwa, Yoshi Suhara

TL;DR
Star Elastic introduces a post-training method for large language models that creates nested submodels, enabling efficient inference, elastic resource allocation, and significant cost reductions while maintaining or improving performance.
Contribution
It presents a novel elastic budget control technique for LLMs that supports nested submodels, dynamic inference, and curriculum distillation, reducing training costs and improving accuracy-latency trade-offs.
Findings
Nested models match or outperform independent baselines.
Achieves 360x training cost reduction compared to pretraining from scratch.
Up to 16% higher accuracy and 1.9x lower latency with dynamic model selection.
Abstract
Training a family of large language models (LLMs), either from scratch or via iterative compression, is prohibitively expensive and inefficient, requiring separate training runs for each model in the family. In this paper, we introduce Star Elastic, a novel LLM post-training method that adds N nested submodels to a given parent reasoning model using the compute of one run (N-fold savings) via a single post-training job. Beyond reducing training costs, Star Elastic also addresses a fundamental limitation of efficient reasoning: the rigidity of static architectures, which forces the allocation of constant resources regardless of token difficulty. By unlocking elastic budget control, Star Elastic enables a novel inference scheme that uses different submodels for each reasoning phase (thinking and answering). Star Elastic supports (1) nesting along the SSM, embedding channel, MoE, and FFN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
