RouteLMT: Learned Sample Routing for Hybrid LLM Translation Deployment
Yingfeng Luo, Hongyu Liu, Dingyang Lin, Kaiyan Chang, Chenglong Wang, Bei Li, Quan Du, Tong Xiao, Jingbo Zhu

TL;DR
RouteLMT introduces an in-model routing method for hybrid LLM translation systems, predicting the large model's marginal gain to optimize cost-quality trade-offs without external predictors.
Contribution
It formulates routing as a budget allocation problem and proposes RouteLMT, an efficient in-model predictor that improves routing decisions over heuristic methods.
Findings
RouteLMT outperforms heuristic and baseline methods in quality-budget trade-offs.
The method effectively predicts the large model's marginal gain using prompt-token representations.
A guarded variant mitigates severe quality losses in routing decisions.
Abstract
Large Language Models (LLMs) have achieved remarkable performance in Machine Translation (MT), but deploying them at scale remains prohibitively expensive. A widely adopted remedy is the hybrid system paradigm, which balances cost and quality by serving most requests with a small model and selectively routing a fraction to a large model. However, existing routing strategies often rely on heuristics, external predictors, or absolute quality estimation, which fail to capture whether the large model actually provides a worthwhile improvement over the small one. In this paper, we formulate routing as a budget allocation problem and identify marginal gain, i.e., the large model's improvement over the small model, as the optimal signal for budgeted decisions. Building on this, we propose \textbf{RouteLMT} (routing for LLM-based MT), an efficient in-model router that predicts this expected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
