Differentially Private and Communication Efficient Large Language Model Split Inference via Stochastic Quantization and Soft Prompt

Yujie Gu; Richeng Jin; Xiaoyu Ji; Yier Jin; Wenyuan Xu

arXiv:2602.11513·cs.CR·February 13, 2026

Differentially Private and Communication Efficient Large Language Model Split Inference via Stochastic Quantization and Soft Prompt

Yujie Gu, Richeng Jin, Xiaoyu Ji, Yier Jin, Wenyuan Xu

PDF

Open Access

TL;DR

This paper introduces DEL, a novel framework that enhances privacy and reduces communication costs in large language model inference by combining differential privacy, stochastic quantization, and soft prompts, eliminating the need for local models.

Contribution

The work is the first to leverage soft prompts to balance privacy and utility in privacy-preserving LLM split inference, integrating differential privacy with communication efficiency.

Findings

01

Effective privacy-utility trade-off demonstrated on benchmarks.

02

Significant reduction in communication overhead.

03

Maintains high performance in text generation and understanding tasks.

Abstract

Large Language Models (LLMs) have achieved remarkable performance and received significant research interest. The enormous computational demands, however, hinder the local deployment on devices with limited resources. The current prevalent LLM inference paradigms require users to send queries to the service providers for processing, which raises critical privacy concerns. Existing approaches propose to allow the users to obfuscate the token embeddings before transmission and utilize local models for denoising. Nonetheless, transmitting the token embeddings and deploying local models may result in excessive communication and computation overhead, preventing practical implementation. In this work, we propose \textbf{DEL}, a framework for \textbf{D}ifferentially private and communication \textbf{E}fficient \textbf{L}LM split inference. More specifically, an embedding projection module and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Advanced Graph Neural Networks · Adversarial Robustness in Machine Learning