PSY: Posterior Sampling Based Privacy Enhancer in Large Language Models

Yulian Sun; Li Duan; Yong Li

arXiv:2410.18824·cs.CR·October 25, 2024

PSY: Posterior Sampling Based Privacy Enhancer in Large Language Models

Yulian Sun, Li Duan, Yong Li

PDF

Open Access

TL;DR

This paper introduces PSY, a privacy enhancement method for large language models using posterior sampling within LoRA, effectively reducing privacy leakage without harming model performance.

Contribution

The paper proposes a novel privacy enhancement technique called PSY that integrates posterior sampling into LoRA for LLMs, improving privacy protection.

Findings

01

PSY reduces attack success rates in membership inference and data extraction.

02

PSY maintains model performance with minimal negative impact.

03

Effective privacy enhancement demonstrated across multiple LLM architectures and datasets.

Abstract

Privacy vulnerabilities in LLMs, such as leakage from memorization, have been constantly identified, and various mitigation proposals have been proposed. LoRA is usually used in fine-tuning LLMs and a good entry point to insert privacy-enhancing modules. In this ongoing research, we introduce PSY, a Posterior Sampling based PrivacY enhancer that can be used in LoRA. We propose a simple yet effective realization of PSY using posterior sampling, which effectively prevents privacy leakage from intermediate information and, in turn, preserves the privacy of data owners. We evaluate LoRA extended with PSY against state-of-the-art membership inference and data extraction attacks. The experiments are executed on three different LLM architectures fine-tuned on three datasets with LoRA. In contrast to the commonly used differential privacy method, we find that our proposed modification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data