RouteLMT: Learned Sample Routing for Hybrid LLM Translation Deployment

Yingfeng Luo; Hongyu Liu; Dingyang Lin; Kaiyan Chang; Chenglong Wang; Bei Li; Quan Du; Tong Xiao; Jingbo Zhu

arXiv:2604.22520·cs.CL·April 27, 2026

RouteLMT: Learned Sample Routing for Hybrid LLM Translation Deployment

Yingfeng Luo, Hongyu Liu, Dingyang Lin, Kaiyan Chang, Chenglong Wang, Bei Li, Quan Du, Tong Xiao, Jingbo Zhu

PDF

TL;DR

RouteLMT introduces an in-model routing method for hybrid LLM translation systems, predicting the large model's marginal gain to optimize cost-quality trade-offs without external predictors.

Contribution

It formulates routing as a budget allocation problem and proposes RouteLMT, an efficient in-model predictor that improves routing decisions over heuristic methods.

Findings

01

RouteLMT outperforms heuristic and baseline methods in quality-budget trade-offs.

02

The method effectively predicts the large model's marginal gain using prompt-token representations.

03

A guarded variant mitigates severe quality losses in routing decisions.

Abstract

Large Language Models (LLMs) have achieved remarkable performance in Machine Translation (MT), but deploying them at scale remains prohibitively expensive. A widely adopted remedy is the hybrid system paradigm, which balances cost and quality by serving most requests with a small model and selectively routing a fraction to a large model. However, existing routing strategies often rely on heuristics, external predictors, or absolute quality estimation, which fail to capture whether the large model actually provides a worthwhile improvement over the small one. In this paper, we formulate routing as a budget allocation problem and identify marginal gain, i.e., the large model's improvement over the small model, as the optimal signal for budgeted decisions. Building on this, we propose \textbf{RouteLMT} (routing for LLM-based MT), an efficient in-model router that predicts this expected…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.