Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space

Cheng Yan; Wuyang Zhang; Zhiyuan Ning; Fan Xu; Ziyang Tao; Lu Zhang; Bing Yin; Yanyong Zhang

arXiv:2601.06220·cs.LG·January 13, 2026

Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space

Cheng Yan, Wuyang Zhang, Zhiyuan Ning, Fan Xu, Ziyang Tao, Lu Zhang, Bing Yin, Yanyong Zhang

PDF

Open Access 1 Video

TL;DR

ZeroRouter introduces a universal latent space for LLM routing, enabling zero-shot model onboarding and significantly reducing retraining costs while improving accuracy, cost-efficiency, and latency in model selection.

Contribution

It proposes a universal latent space for LLM routing that decouples query characterization from model profiling, allowing zero-shot onboarding of new models without retraining.

Findings

01

Outperforms baselines in accuracy, cost, and latency

02

Enables zero-shot onboarding of new models

03

Reduces retraining costs significantly

Abstract

The rapid proliferation of Large Language Models (LLMs) has led to a fragmented and inefficient ecosystem, a state of ``model lock-in'' where seamlessly integrating novel models remains a significant bottleneck. Current routing frameworks require exhaustive, costly retraining, hindering scalability and adaptability. We introduce ZeroRouter, a new paradigm for LLM routing that breaks this lock-in. Our approach is founded on a universal latent space, a model-agnostic representation of query difficulty that fundamentally decouples the characterization of a query from the profiling of a model. This allows for zero-shot onboarding of new models without full-scale retraining. ZeroRouter features a context-aware predictor that maps queries to this universal space and a dual-mode optimizer that balances accuracy, cost, and latency. Our framework consistently outperforms all baselines,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Graph Neural Networks