GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts

Wenhao Zeng; Xuteng Zhang; Yuling Shi; Chao Hu; Yuting Chen; Beijun Shen; Xiaodong Gu

arXiv:2601.05110·cs.AI·April 29, 2026

GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts

Wenhao Zeng, Xuteng Zhang, Yuling Shi, Chao Hu, Yuting Chen, Beijun Shen, Xiaodong Gu

PDF

1 Repo

TL;DR

GlimpRouter is a training-free framework that improves reasoning efficiency by routing steps to large or small models based on initial token entropy, reducing latency while maintaining accuracy.

Contribution

It introduces a novel, entropy-based, step-wise collaboration method that predicts reasoning step difficulty from the first token, enabling efficient inference without additional training.

Findings

01

Achieves 10.7% accuracy improvement on AIME25.

02

Reduces inference latency by 25.9% compared to large models.

03

Effectively predicts step difficulty using initial token entropy.

Abstract

Large Reasoning Models (LRMs) achieve remarkable performance by explicitly generating multi-step chains of thought, but this capability incurs substantial inference latency and computational cost. Collaborative inference offers a promising solution by selectively allocating work between lightweight and large models, yet a fundamental challenge remains: determining when a reasoning step requires the capacity of a large model or the efficiency of a small model. Existing routing strategies either rely on local token probabilities or post-hoc verification, introducing significant inference overhead. In this work, we propose a novel perspective on step-wise collaboration: the difficulty of a reasoning step can be inferred from its very first token. Inspired by the "Aha Moment" phenomenon in LRMs, we show that the entropy of the initial token serves as a strong predictor of step difficulty.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zengwh02/GlimpRouter
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.