PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge Networks
Huiyou Zhan, Xuan Zhang, Haisheng Tan, Han Tian, Dongping Yong,, Junyang Zhang, Xiang-Yang Li

TL;DR
PICE is a cloud-edge inference system for large language models that improves throughput and latency by progressive, semantic-driven collaboration between cloud and edge models, with dynamic scheduling and ensemble learning.
Contribution
This work introduces PICE, a novel semantic-driven progressive inference system that enhances LLM serving efficiency and quality through cloud-edge collaboration and dynamic task management.
Findings
Achieves 1.5-2x throughput improvement
Reduces latency by up to 43%
Potentially improves inference quality over state-of-the-art systems
Abstract
Large language models (LLMs), while driving a new wave of interactive AI applications across numerous domains, suffer from high inference costs and heavy cloud dependency. Motivated by the redundancy phenomenon in linguistics, we propose a progressive inference paradigm over cloud and edge, i.e., firstly generating the sketch of the answer by LLMs at cloud, and then conducting parallel extension to fill in details by small models (SLMs) at edge. Progressive inference offers potential benefits to improve throughput and reduce inference latency while facing key implementation challenges, including decreased response quality from SLMs, a tradeoff between the brevity and comprehensiveness of sketches, as well as increased latency caused by network transmission and edge inference. In this work, we propose and implement PICE, an LLM serving system with semantic-level cloud-edge collaboration,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Data Security Solutions · Service-Oriented Architecture and Web Services · Access Control and Trust
