PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration

Ziqian Zeng; Jianwei Wang; Junyao Yang; Zhengdong Lu; Haoran Li; Huiping Zhuang; Cen Chen

arXiv:2406.01394·cs.CR·May 29, 2025·1 cites

PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration

Ziqian Zeng, Jianwei Wang, Junyao Yang, Zhengdong Lu, Haoran Li, Huiping Zhuang, Cen Chen

PDF

Open Access 4 Reviews

TL;DR

PrivacyRestore is a novel method that enables privacy-preserving inference in large language models by removing and restoring private information, balancing privacy protection with performance and efficiency.

Contribution

It introduces a plug-and-play privacy protection technique using restoration vectors and activation steering, addressing privacy, performance, and inference overhead issues in LLMs.

Findings

01

Effectively protects private information in LLM inference.

02

Maintains acceptable performance and inference overhead.

03

Prevents linear growth of privacy budget.

Abstract

The widespread usage of online Large Language Models (LLMs) inference services has raised significant privacy concerns about the potential exposure of private information in user inputs to malicious eavesdroppers. Existing privacy protection methods for LLMs suffer from either insufficient privacy protection, performance degradation, or large inference time overhead. To address these limitations, we propose PrivacyRestore, a plug-and-play method to protect the privacy of user inputs during LLM inference. The server first trains restoration vectors for each privacy span and then release to clients. Privacy span is defined as a contiguous sequence of tokens within a text that contain private information. The client then aggregate restoration vectors of all privacy spans in the input into a single meta restoration vector which is later sent to the server side along with the input without…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 5Confidence 4

Strengths

- The paper proposes a plug-and-play method. - The authors conduct extensive experiments to validate the effectiveness of their methods.

Weaknesses

- **Identification of privacy spans**. The authors didn't discuss in detail how to identify the privacy span in their paper. As is observed in the dataset they provided, there's still sensitive information contained in the input after the privacy span removed. For example, for the raw query "A 31-year-old male has a history of antipsychotic medication usage, nausea, stimulant drug use. The 31-year-old male presents the symptoms of involuntary eye movement, jaw pain, muscle spasms, ptosis, shortn

Reviewer 02Rating 3Confidence 3

Strengths

The paper raises an important question of the appropriate privacy unit for text processing. While DP for language models has been studied extensively, the field lacks consensus on what threat model should be applied to text, and whether sequence-level or document-level approaches appropriately mitigate privacy risks.

Weaknesses

## List of privacy spans. The paper assumes a closed list of available privacy spans - less than 200 words/phrases used in the experimental settings. First, this introduces a privacy leakage from the mere fact of removing privacy spans - it reveals to the attacker that an original text contained one of the few "sensitive" items. Second, privacy is contextual, and defining an exhaustive list of sensitive phrases for a real-world use might be tricky, even if limited to a narrow domain, e.g. medica

Reviewer 03Rating 5Confidence 4

Strengths

Comparisons against d_X-privacy token-perturbation methods on the entire input or only privacy spans, and against paraphrasing, provides interesting baselines in the experiments. I also like the proposed attacks for the experiments on empirical privacy protection. The proposed prompt injection attack to output the original text is a good additional idea to check how well sensitive information is actually protected. On the other hand, it seems the inversion attacks could be strengthened (cf. wea

Weaknesses

Theorem 5.1: - What is the context/what assumptions do you make how DP methods are utilized here? The way you formulate Theorem 5.1, it reads as if it applies to any LDP/CDP/d_X privacy mechanism. However, that makes no sense since DP can be applied in other use cases apart from protecting text, and even for text, it depends on how the DP mechanism is applied to a text to make conclusions about the required privacy budget. So you should specify the actual scope to which your Theorem 5.1 applies

Reviewer 04Rating 5Confidence 3

Strengths

1. The paper focuses on an important problem of privacy-preserving inference of large language models 2. I believe having a hybrid privacy setting where the input is a mixture of private and public information is more practical. In particular, the idea of a privacy span is interesting.

Weaknesses

1. There are a lot of moving parts in the framework that make it difficult to understand and follow. Rather than only elaborating on the technical aspects of each step, the authors should also discuss a running example, similar to the example user input used in Figure 1, to help give more intuition about what each step is doing. 2. It would be beneficial if the authors could provide a background section on the attention mechanism and activation steering methods prior to introducing the methodol

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data