Dynamic Mixed-Precision Routing for Efficient Multi-step LLM Interaction

Yuanzhe Li; Jianing Deng; Jingtong Hu; Tianlong Chen; Song Wang; Huanrui Yang

arXiv:2602.02711·cs.AI·May 15, 2026

Dynamic Mixed-Precision Routing for Efficient Multi-step LLM Interaction

Yuanzhe Li, Jianing Deng, Jingtong Hu, Tianlong Chen, Song Wang, Huanrui Yang

PDF

TL;DR

This paper introduces Dynamic Mixed-Precision Routing (DMR), a method that adaptively switches between high- and low-precision LLMs during multi-step decision tasks to improve efficiency without sacrificing accuracy.

Contribution

The paper proposes a novel adaptive framework, DMR, that selects precision levels per step using a two-stage training process, reducing inference costs in long-horizon LLM tasks.

Findings

01

DMR achieves better accuracy-cost trade-offs than single-precision baselines.

02

Experiments on ALFWorld and WebShop show improved efficiency and success rates.

03

The approach effectively identifies precision-sensitive steps for adaptive routing.

Abstract

Large language models (LLMs) achieve strong performance in long-horizon decision-making tasks through multi-step interaction and reasoning at test time. While practitioners commonly believe a higher task success rate necessitates the use of a larger and stronger LLM model, multi-step interaction with a large LLM incurs prohibitive inference cost. To address this problem, we explore the use of low-precision quantized LLMs in the long-horizon decision-making process. Based on the observation of diverse sensitivities among interaction steps, we propose Dynamic Mixed-Precision Routing (DMR), a framework that adaptively selects between high-precision and low-precision LLMs at each decision step. The router is trained via a two-stage pipeline, consisting of KL-divergence-based supervised learning that identifies precision-sensitive steps, followed by Group-Relative Policy Optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.