Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference
Huamin Chen, Xunzhuo Liu, Junchen Jiang, Bowei He, Xue Liu

TL;DR
This paper introduces Outcome-Aware Tool Selection (OATS), a latency-efficient method for semantic routers that improves tool selection accuracy without adding runtime costs, by leveraging offline outcome-based embeddings.
Contribution
The paper proposes a novel offline interpolation technique for tool embeddings that enhances semantic router performance without increasing serving latency.
Findings
OATS improves NDCG@5 from 0.869 to 0.940 on MetaTool
OATS improves NDCG@5 from 0.834 to 0.848 on ToolBench
Contrastive adapter achieves comparable gains with minimal latency
Abstract
Semantic routers in LLM inference gateways select tools in the critical request path, where every millisecond of added latency compounds across millions of requests. We propose Outcome-Aware Tool Selection (OATS), which interpolates tool embeddings toward the centroid of queries where they historically succeed -- an offline process that adds no parameters, latency, or GPU cost at serving time. On MetaTool (199~tools, 4,287~queries), this improves NDCG@5 from 0.869 to 0.940; on ToolBench (2,413~APIs), from 0.834 to 0.848. We also evaluate two learned extensions: a 2,625-parameter MLP re-ranker and a 197K-parameter contrastive adapter. The MLP re-ranker hurts or matches baseline when outcome data is sparse relative to the tool set; the contrastive adapter provides comparable gains on MetaTool (NDCG@5: 0.931). All methods are evaluated on the same held-out 30\% test split. The practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Packet Processing and Optimization · Software-Defined Networks and 5G · Advanced Neural Network Applications
