PRISM: Privacy-Aware Routing for Adaptive Cloud-Edge LLM Inference via Semantic Sketch Collaboration
Junfei Zhan, and Haoxun Shen, and Zheng Lin, and Tengjiao He

TL;DR
PRISM is a context-aware framework that dynamically balances privacy and inference quality in cloud-edge LLM inference by profiling input sensitivity and adaptively applying differential privacy, reducing energy and latency while maintaining output quality.
Contribution
PRISM introduces a novel, adaptive privacy-aware routing framework that considers input sensitivity for improved privacy-utility trade-offs in cloud-edge LLM inference.
Findings
Reduces energy consumption and latency to 40-50% of baseline methods.
Achieves superior privacy-utility trade-offs across various scenarios.
Maintains high output quality under strong privacy constraints.
Abstract
Large Language Models (LLMs) demonstrate impressive capabilities in natural language understanding and generation, but incur high communication overhead and privacy risks in cloud deployments, while facing compute and memory constraints when confined to edge devices. Cloud-edge inference has emerged as a promising paradigm for improving privacy in LLM services by retaining sensitive computations on local devices. However, existing cloud-edge inference approaches apply uniform privacy protection without considering input sensitivity, resulting in unnecessary perturbation and degraded utility even for non-sensitive tokens. To address this limitation, we propose Privacy-aware Routing for Inference with Semantic Modulation (PRISM), a context-aware framework that dynamically balances privacy and inference quality. PRISM executes in four stages: (1) the edge device profiles entity-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · IoT and Edge/Fog Computing · Big Data and Digital Economy
