Confidence-Guided Stepwise Model Routing for Cost-Efficient Reasoning
Sangmook Lee, Dohyung Kim, Hyukhun Koh, Nakyeong Yang, Kyomin Jung

TL;DR
STEER is a confidence-guided, stepwise routing framework that dynamically allocates reasoning tasks between smaller and larger LLMs, significantly reducing inference costs while maintaining or improving accuracy across diverse benchmarks.
Contribution
The paper introduces STEER, a novel domain-agnostic, internal-confidence-based routing method that eliminates the need for external models or costly data synthesis for cost-efficient LLM reasoning.
Findings
Achieves up to 20% accuracy improvement with 48% less FLOPs.
Outperforms baselines relying on trained external modules.
Demonstrates robustness across multiple domains and benchmarks.
Abstract
Recent advances in Large Language Models (LLMs) - particularly model scaling and test-time techniques - have greatly enhanced the reasoning capabilities of language models at the expense of higher inference costs. To lower inference costs, prior works train router models or deferral mechanisms that allocate easy queries to a small, efficient model, while forwarding harder queries to larger, more expensive models. However, these trained router models often lack robustness under domain shifts and require expensive data synthesis techniques such as Monte Carlo rollouts to obtain sufficient ground-truth routing labels for training. In this work, we propose Confidence-Guided Stepwise Model Routing for Cost-Efficient Reasoning (STEER), a domain-agnostic framework that performs fine-grained, step-level routing between smaller and larger LLMs without utilizing external models. STEER leverages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Graph Neural Networks
