Dynamic Mixed-Precision Routing for Efficient Multi-step LLM Interaction
Yuanzhe Li, Jianing Deng, Jingtong Hu, Tianlong Chen, Song Wang, Huanrui Yang

TL;DR
This paper introduces Dynamic Mixed-Precision Routing (DMR), a method that adaptively switches between high- and low-precision LLMs during multi-step decision tasks to improve efficiency without sacrificing accuracy.
Contribution
The paper proposes a novel adaptive framework, DMR, that selects precision levels per step using a two-stage training process, reducing inference costs in long-horizon LLM tasks.
Findings
DMR achieves better accuracy-cost trade-offs than single-precision baselines.
Experiments on ALFWorld and WebShop show improved efficiency and success rates.
The approach effectively identifies precision-sensitive steps for adaptive routing.
Abstract
Large language models (LLMs) achieve strong performance in long-horizon decision-making tasks through multi-step interaction and reasoning at test time. While practitioners commonly believe a higher task success rate necessitates the use of a larger and stronger LLM model, multi-step interaction with a large LLM incurs prohibitive inference cost. To address this problem, we explore the use of low-precision quantized LLMs in the long-horizon decision-making process. Based on the observation of diverse sensitivities among interaction steps, we propose Dynamic Mixed-Precision Routing (DMR), a framework that adaptively selects between high-precision and low-precision LLMs at each decision step. The router is trained via a two-stage pipeline, consisting of KL-divergence-based supervised learning that identifies precision-sensitive steps, followed by Group-Relative Policy Optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
