Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning

Zichuan Fu; Xian Wu; Guojing Li; Yejing Wang; Yijun Chen; Zihao Zhao; Yixuan Luo; Hanyu Yan; Yefeng Zheng; Xiangyu Zhao

arXiv:2604.23623·cs.AI·April 28, 2026

Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning

Zichuan Fu, Xian Wu, Guojing Li, Yejing Wang, Yijun Chen, Zihao Zhao, Yixuan Luo, Hanyu Yan, Yefeng Zheng, Xiangyu Zhao

PDF

1 Repo

TL;DR

Tandem is a collaborative framework that combines large and small language models to perform reasoning tasks more efficiently, reducing computational costs by about 40% while maintaining high performance.

Contribution

The paper introduces a novel LLM-SLM collaboration approach with a cost-aware termination mechanism for efficient reasoning, with code available online.

Findings

01

Reduces computational costs by approximately 40% compared to standalone LLM reasoning.

02

Achieves superior or competitive performance on mathematical reasoning and code generation benchmarks.

03

Sufficiency classifier transfers effectively across different domains without retraining.

Abstract

Recent advancements in large language models (LLMs) have catalyzed the rise of reasoning-intensive inference paradigms, where models perform explicit step-by-step reasoning before generating final answers. While such approaches improve answer quality and interpretability, they incur substantial computational overhead due to the prolonged generation sequences. In this paper, we propose Tandem, a novel collaborative framework that synergizes large and small language models (LLMs and SLMs) to achieve high-quality reasoning with significantly reduced computational cost. Specifically, the LLM serves as a strategic coordinator, efficiently generating a compact set of critical reasoning insights. These insights are then used to guide a smaller, more efficient SLM in executing the full reasoning process and delivering the final response. To balance efficiency and reliability, Tandem introduces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Applied-Machine-Learning-Lab/ACL2026_Tandem
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.