TL;DR
MTRouter is a cost-aware routing method for multi-turn LLM tasks that encodes history and model interactions to optimize model selection, significantly reducing inference costs while maintaining or improving performance.
Contribution
It introduces a novel joint history-model embedding approach and an outcome estimator for efficient, cost-aware multi-turn LLM routing, outperforming prior methods.
Findings
Reduces total inference cost by up to 58.7% on ScienceWorld.
Achieves competitive accuracy with 43.4% cost reduction on HLE.
Exhibits fewer model switches and greater error tolerance compared to prior routers.
Abstract
Multi-turn, long-horizon tasks are increasingly common for large language models (LLMs), but solving them typically requires many sequential model invocations, accumulating substantial inference costs. Here, we study cost-aware multi-turn LLM routing: selecting which model to invoke at each turn from a model pool, given a fixed cost budget. We propose MTRouter, which encodes the interaction history and candidate models into joint history-model embeddings, and learns an outcome estimator from logged trajectories to predict turn-level model utility. Experiments show that MTRouter improves the performance-cost trade-off: on ScienceWorld, it surpasses GPT-5 while reducing total cost by 58.7%; on Humanity's Last Exam (HLE), it achieves competitive accuracy while reducing total cost by 43.4% relative to GPT-5, and these gains even carry over to held-out tasks. Further analyses reveal several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
