Towards Fair and Comprehensive Evaluation of Routers in Collaborative LLM Systems
Wanxing Wu, He Zhu, Yixia Li, Lei Yang, Jiehui Zhao, Hongru Wang, Jian Yang, Benyou Wang, Bingyi Jing, Guanhua Chen

TL;DR
This paper introduces RouterXBench, a comprehensive evaluation framework for routers in collaborative LLM systems, and proposes ProbeDirichlet, a robust, probabilistic router that outperforms existing methods across various scenarios.
Contribution
The paper presents RouterXBench for systematic router evaluation and introduces ProbeDirichlet, a novel probabilistic router leveraging internal hidden states for improved robustness and generalization.
Findings
ProbeDirichlet achieves 16.68% and 18.86% improvements over baselines.
It generalizes well across model types, scales, and tasks.
RouterXBench provides a multi-dimensional evaluation of router performance.
Abstract
Large language models (LLMs) have achieved success, but cost and privacy constraints necessitate deploying smaller models locally while offloading complex queries to cloud-based models. Existing router evaluations are unsystematic, overlooking scenario-specific requirements and out-of-distribution robustness. We propose RouterXBench, a principled evaluation framework with three dimensions: router ability, scenario alignment, and cross-domain robustness. Unlike prior work that relies on output probabilities or external embeddings, we utilize internal hidden states that capture model uncertainty before answer generation. We introduce ProbeDirichlet, a lightweight router that aggregates cross-layer hidden states via learnable Dirichlet distributions with probabilistic training. Trained on multi-domain data, it generalizes robustly across in-domain and out-of-distribution scenarios. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Privacy-Preserving Technologies in Data · Big Data and Digital Economy
