RankGuide: Tensor-Rank-Guided Routing and Steering for Efficient Reasoning
Jiayi Tian, Yupeng Su, Ryan Solgi, Souvik Kundu, Zheng Zhang

TL;DR
RankGuide introduces tensor-rank-guided routing and steering to enhance the efficiency and accuracy of small reasoning models collaborating with large reasoning models, reducing latency and maintaining performance.
Contribution
It proposes a novel tensor-rank-based framework for detecting SRM failures and guiding reasoning, improving collaboration efficiency in large reasoning models.
Findings
Reduces inference latency by up to 1.75x compared to large reasoning models.
Effectively detects SRM failures using tensor-rank signals.
Maintains competitive accuracy while improving reasoning efficiency.
Abstract
Large reasoning models (LRMs) enhance problem-solving capabilities by generating explicit multi-step chains of thought (CoT) reasoning; however, they incur substantial inference latency and computational overhead. To mitigate this issue, recent works have explored model collaboration paradigms, where small reasoning models (SRMs) generate intermediate reasoning steps to achieve a better accuracy--latency trade-off. Despite recent progress, effectively and efficiently detecting and mitigating SRM failures in collaborative systems remains a key challenge. To address this issue, we analyze SRM inference in both the generated text and hidden-state spaces, and identify three types of failure modes: \textit{overconfidence}, \textit{uncertainty}, and \textit{heavy revalidation}. Building on these insights, we propose \textbf{RankGuide}, a framework that improves the efficiency and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
