LLMRouterBench: A Massive Benchmark and Unified Framework for LLM Routing

Hao Li; Yiqun Zhang; Zhaoyan Guo; Chenxu Wang; Shengji Tang; Qiaosheng Zhang; Yang Chen; Biqing Qi; Peng Ye; Lei Bai; Zhen Wang; and Shuyue Hu

arXiv:2601.07206·cs.AI·January 13, 2026

LLMRouterBench: A Massive Benchmark and Unified Framework for LLM Routing

Hao Li, Yiqun Zhang, Zhaoyan Guo, Chenxu Wang, Shengji Tang, Qiaosheng Zhang, Yang Chen, Biqing Qi, Peng Ye, Lei Bai, Zhen Wang, and Shuyue Hu

PDF

Open Access

TL;DR

LLMRouterBench is a comprehensive benchmark and framework for evaluating large language model routing methods, revealing that many approaches perform similarly and highlighting areas for improvement in model recall and efficiency.

Contribution

Introduces a large-scale benchmark and unified framework for LLM routing, enabling systematic evaluation and comparison of routing methods across multiple metrics and datasets.

Findings

01

Many routing methods perform similarly under unified evaluation.

02

Recent approaches often do not outperform simple baselines.

03

Significant gap remains to the Oracle due to model-recall failures.

Abstract

Large language model (LLM) routing assigns each query to the most suitable model from an ensemble. We introduce LLMRouterBench, a large-scale benchmark and unified framework for LLM routing. It comprises over 400K instances from 21 datasets and 33 models. Moreover, it provides comprehensive metrics for both performance-oriented routing and performance-cost trade-off routing, and integrates 10 representative routing baselines. Using LLMRouterBench, we systematically re-evaluate the field. While confirming strong model complementarity-the central premise of LLM routing-we find that many routing methods exhibit similar performance under unified evaluation, and several recent approaches, including commercial routers, fail to reliably outperform a simple baseline. Meanwhile, a substantial gap remains to the Oracle, driven primarily by persistent model-recall failures. We further show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Topic Modeling · Natural Language Processing Techniques