Recursive Offloading for LLM Serving in Multi-tier Networks
Zhiyuan Wu, Sheng Sun, Yuwei Wang, Min Liu, Bo Gao, Jinda Lu, Zheming Yang, Tian Wen

TL;DR
RecServe is a recursive offloading framework that optimizes LLM inference in multi-tier networks by dynamically adjusting offloading decisions based on task complexity and confidence, significantly reducing communication overhead and improving service quality.
Contribution
The paper introduces RecServe, a novel recursive offloading framework with hierarchical confidence evaluation and dynamic thresholding for efficient LLM serving across device, edge, and cloud tiers.
Findings
RecServe reduces communication overhead by over 50% compared to centralized cloud serving.
RecServe outperforms existing methods like CasServe in service quality.
Experimental results on eight datasets validate RecServe's effectiveness.
Abstract
Heterogeneous device-edge-cloud computing infrastructures have become widely adopted in telecommunication operators and Wide Area Networks (WANs), offering multi-tier computational support for emerging intelligent services. With the rapid proliferation of Large Language Model (LLM) services, efficiently coordinating inference tasks and reducing communication overhead within these multi-tier network architectures becomes a critical deployment challenge. Existing LLM serving paradigms exhibit significant limitations: on-device deployment supports only lightweight LLMs due to hardware constraints, while cloud-centric deployment suffers from resource congestion and considerable prompt communication overhead caused by frequent service requests during peak periods. Although the model-cascading-based inference strategy adapts better to multi-tier networks, its reliance on fine-grained,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced MIMO Systems Optimization · IoT and Edge/Fog Computing · IoT Networks and Protocols
