From Myopic Selection to Long-Horizon Awareness: Sequential LLM Routing for Multi-Turn Dialogue

Jiarui Zhang; Xiangyu Liu; Yong Hu; Chaoyue Niu; Hang Zeng; Shaojie Tang; Fan Wu; Guihai Chen

arXiv:2604.12385·cs.CL·April 15, 2026

From Myopic Selection to Long-Horizon Awareness: Sequential LLM Routing for Multi-Turn Dialogue

Jiarui Zhang, Xiangyu Liu, Yong Hu, Chaoyue Niu, Hang Zeng, Shaojie Tang, Fan Wu, Guihai Chen

PDF

TL;DR

This paper introduces DialRouter, a long-horizon sequential routing method for multi-turn dialogue with LLMs, using MCTS and learned policies to improve performance over existing single-turn routing approaches.

Contribution

The paper presents DialRouter, a novel approach combining MCTS and learned policies for multi-turn dialogue routing, addressing limitations of myopic single-turn methods.

Findings

01

DialRouter outperforms single LLMs and existing routing baselines in task success rate.

02

It achieves a better performance-cost trade-off with cost-aware rewards.

03

Experiments on diverse dialogue tasks validate its effectiveness.

Abstract

Multi-turn dialogue is the predominant form of interaction with large language models (LLMs). While LLM routing is effective in single-turn settings, existing methods fail to maximize cumulative performance in multi-turn dialogue due to interaction dynamics and delayed rewards. To address this challenge, we move from myopic, single-turn selection to long-horizon sequential routing for multi-turn dialogue. Accordingly, we propose DialRouter, which first performs MCTS to explore dialogue branches induced by different LLM selections and collect trajectories with high cumulative rewards. DialRouter then learns a lightweight routing policy from search-derived data, augmented with retrieval-based future state approximation, enabling multi-turn routing without online search. Experiments on both open-domain and domain-specific dialogue tasks across diverse candidate sets of both open-source and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.