Latency-Quality Routing for Functionally Equivalent Tools in LLM Agents
Kexin Chu, Dawei Xiang, Wei Zhang

TL;DR
This paper introduces LQM-ContextRoute, a contextual bandit router for selecting among functionally equivalent tool providers in LLM agents, optimizing latency and quality under load.
Contribution
It proposes a novel latency-quality matching approach that ranks providers by expected answer quality per service cycle, adapting online to load and quality differences.
Findings
Improves F1 by +2.18 pp on web-search benchmark.
Enhances accuracy by up to +18 pp in StrategyQA setting.
Increases NDCG by +2.91 to +3.22 pp on heterogeneous retriever pools.
Abstract
Tool-augmented LLM agents increasingly access the same tool type through multiple functionally equivalent providers, such as web-search APIs, retrievers, or LLM backends exposed behind a shared interface. This creates a provider-routing problem under runtime load: the router must choose among providers that differ in latency, reliability, and answer quality, often without gold labels at deployment time. We introduce LQM-ContextRoute, a contextual bandit router for same-function tool providers. Its key design is latency-quality matching: instead of letting low latency offset poor answers in an additive reward, the router ranks providers by expected answer quality per service cycle. It combines this capacity-aware score with query-specific quality estimation and LLM-as-judge feedback, allowing it to adapt online to both load changes and provider-quality differences. On the main web-search…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
