Eagle: Efficient Training-Free Router for Multi-LLM Inference
Zesen Zhao, Shuowei Jin, Z. Morley Mao

TL;DR
Eagle is a training-free, scalable LLM routing method that improves model selection accuracy and efficiency in high-volume online environments by combining global and local ranking modules.
Contribution
Eagle introduces a novel training-free LLM routing approach using global and local ELO ranking modules for better scalability and real-time adaptation.
Findings
Outperforms baseline methods with up to 23.52% AUC improvement.
Requires only 1/20 of baseline initialization time.
Offers 100-200x faster incremental updates in online scenarios.
Abstract
The proliferation of Large Language Models (LLMs) with varying capabilities and costs has created a need for efficient model selection in AI systems. LLM routers address this need by dynamically choosing the most suitable model for a given query based on task requirements and budget constraints. However, existing routers face challenges in scalability and real-time adaptation, particularly in high-volume online environments. We present Eagle, a novel LLM routing approach that combines global and local ELO ranking modules to overcome these limitations. By evaluating both general and specialized LLM abilities, Eagle provides a scalable, training-free solution that enhances model selection quality while reducing computational overhead. Our experiments across multiple datasets show Eagle consistently outperforms baseline methods, with improvements of up to 23.52 percent in Area Under Curve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Traffic Prediction and Management Techniques · Speech Recognition and Synthesis
