Quality-of-Service Aware LLM Routing for Edge Computing with Multiple Experts

Jin Yang; Qiong Wu; Zhiying Feng; Zhi Zhou; Deke Guo; Xu Chen

arXiv:2508.00234·cs.NI·August 4, 2025

Quality-of-Service Aware LLM Routing for Edge Computing with Multiple Experts

Jin Yang, Qiong Wu, Zhiying Feng, Zhi Zhou, Deke Guo, Xu Chen

PDF

Open Access

TL;DR

This paper introduces a DRL-based framework for routing user requests to edge LLMs, optimizing quality-of-service and resource efficiency amidst heterogeneity and dynamic workloads.

Contribution

It presents a novel DRL-based routing approach with dynamic state abstraction and impact estimation for stable QoS in edge LLM services.

Findings

01

Significant QoS improvement over baselines

02

Enhanced resource efficiency in LLM routing

03

Effective handling of workload heterogeneity

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities, leading to a significant increase in user demand for LLM services. However, cloud-based LLM services often suffer from high latency, unstable responsiveness, and privacy concerns. Therefore, multiple LLMs are usually deployed at the network edge to boost real-time responsiveness and protect data privacy, particularly for many emerging smart mobile and IoT applications. Given the varying response quality and latency of LLM services, a critical issue is how to route user requests from mobile and IoT devices to an appropriate LLM service (i.e., edge LLM expert) to ensure acceptable quality-of-service (QoS). Existing routing algorithms fail to simultaneously address the heterogeneity of LLM services, the interference among requests, and the dynamic workloads necessary for maintaining long-term stable QoS. To meet these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing · Big Data and Digital Economy · Mobile Crowdsensing and Crowdsourcing