MTRouter: Cost-Aware Multi-Turn LLM Routing with History-Model Joint Embeddings

Yiqun Zhang; Hao Li; Zihan Wang; Shi Feng; Xiaocui Yang; Daling Wang; Bo Zhang; Lei Bai; Shuyue Hu

arXiv:2604.23530·cs.CL·April 28, 2026

MTRouter: Cost-Aware Multi-Turn LLM Routing with History-Model Joint Embeddings

Yiqun Zhang, Hao Li, Zihan Wang, Shi Feng, Xiaocui Yang, Daling Wang, Bo Zhang, Lei Bai, Shuyue Hu

PDF

1 Repo

TL;DR

MTRouter is a cost-aware routing method for multi-turn LLM tasks that encodes history and model interactions to optimize model selection, significantly reducing inference costs while maintaining or improving performance.

Contribution

It introduces a novel joint history-model embedding approach and an outcome estimator for efficient, cost-aware multi-turn LLM routing, outperforming prior methods.

Findings

01

Reduces total inference cost by up to 58.7% on ScienceWorld.

02

Achieves competitive accuracy with 43.4% cost reduction on HLE.

03

Exhibits fewer model switches and greater error tolerance compared to prior routers.

Abstract

Multi-turn, long-horizon tasks are increasingly common for large language models (LLMs), but solving them typically requires many sequential model invocations, accumulating substantial inference costs. Here, we study cost-aware multi-turn LLM routing: selecting which model to invoke at each turn from a model pool, given a fixed cost budget. We propose MTRouter, which encodes the interaction history and candidate models into joint history-model embeddings, and learns an outcome estimator from logged trajectories to predict turn-level model utility. Experiments show that MTRouter improves the performance-cost trade-off: on ScienceWorld, it surpasses GPT-5 while reducing total cost by 58.7%; on Humanity's Last Exam (HLE), it achieves competitive accuracy while reducing total cost by 43.4% relative to GPT-5, and these gains even carry over to held-out tasks. Further analyses reveal several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZhangYiqun018/MTRouter
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.