Synera: Synergistic LLM Serving across Device and Cloud at Scale
Genglin Wang, Liekang Zeng, Bufang Yang, Kaiwei Liu, Guoliang Xing, Chumin Sun, Li Zhou, Jie Sun, Zhenyu Yan

TL;DR
Synera is a device-cloud synergistic system for LLM serving that improves generation quality and reduces costs by optimizing offloading, inference, and batching, addressing latency and resource limitations in mobile applications.
Contribution
It introduces a novel device-cloud synergistic mechanism with tailored optimizations for LLM inference, enhancing performance and cost-efficiency over existing methods.
Findings
Achieves 1.20-5.47x better generation quality.
Reduces cloud serving cost by 8.2-16.5%.
Maintains on-par latency with competitive baselines.
Abstract
Large Language Models (LLMs) are becoming key components in various mobile operating systems, driving smart applications like interactive chatbots and personal assistants. While bringing enhanced intelligence to mobile ends, their deployment suffers from a set of performance challenges, especially the generation quality degradation and prolonged latency. Prior works have mainly relied on solutions of cloud offloading or on-device Small Language Models (SLMs). However, the former is usually limited by the communication bottleneck, and the latter sacrifices generation quality due to resource constraints. To mitigate these limitations, this paper proposes Synera, a device-cloud synergistic LLM serving system that applies an efficient SLM-LLM synergistic mechanism. Through empirical studies on LLM's unique computing characteristics, Synera identifies a set of underexplored optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Big Data and Digital Economy · Mobile Crowdsensing and Crowdsourcing
