From Myopic Selection to Long-Horizon Awareness: Sequential LLM Routing for Multi-Turn Dialogue
Jiarui Zhang, Xiangyu Liu, Yong Hu, Chaoyue Niu, Hang Zeng, Shaojie Tang, Fan Wu, Guihai Chen

TL;DR
This paper introduces DialRouter, a long-horizon sequential routing method for multi-turn dialogue with LLMs, using MCTS and learned policies to improve performance over existing single-turn routing approaches.
Contribution
The paper presents DialRouter, a novel approach combining MCTS and learned policies for multi-turn dialogue routing, addressing limitations of myopic single-turn methods.
Findings
DialRouter outperforms single LLMs and existing routing baselines in task success rate.
It achieves a better performance-cost trade-off with cost-aware rewards.
Experiments on diverse dialogue tasks validate its effectiveness.
Abstract
Multi-turn dialogue is the predominant form of interaction with large language models (LLMs). While LLM routing is effective in single-turn settings, existing methods fail to maximize cumulative performance in multi-turn dialogue due to interaction dynamics and delayed rewards. To address this challenge, we move from myopic, single-turn selection to long-horizon sequential routing for multi-turn dialogue. Accordingly, we propose DialRouter, which first performs MCTS to explore dialogue branches induced by different LLM selections and collect trajectories with high cumulative rewards. DialRouter then learns a lightweight routing policy from search-derived data, augmented with retrieval-based future state approximation, enabling multi-turn routing without online search. Experiments on both open-domain and domain-specific dialogue tasks across diverse candidate sets of both open-source and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
