Mind the Privacy Unit! User-Level Differential Privacy for Language   Model Fine-Tuning

Lynn Chua; Badih Ghazi; Yangsibo Huang; Pritish Kamath; Ravi Kumar,; Daogao Liu; Pasin Manurangsi; Amer Sinha; Chiyuan Zhang

arXiv:2406.14322·cs.CL·August 19, 2024

Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar,, Daogao Liu, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

PDF

Open Access

TL;DR

This paper investigates user-level differential privacy in fine-tuning large language models, emphasizing uniform privacy guarantees across users and evaluating mechanisms like Group Privacy and DP-SGD for natural language tasks.

Contribution

It introduces a systematic evaluation of user-level DP in LLM fine-tuning, addressing uneven privacy guarantees and exploring design choices for optimal privacy-utility balance.

Findings

01

User-level DP provides more uniform privacy protection across users.

02

Different mechanisms and tuning strategies significantly impact privacy-utility tradeoffs.

03

The study offers insights into effective privacy-preserving fine-tuning methods for LLMs.

Abstract

Large language models (LLMs) have emerged as powerful tools for tackling complex tasks across diverse domains, but they also raise privacy concerns when fine-tuned on sensitive data due to potential memorization. While differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit, current evaluations on LLMs mostly treat each example (text record) as the privacy unit. This leads to uneven user privacy guarantees when contributions per user vary. We therefore study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users. We present a systematic evaluation of user-level DP for LLM fine-tuning on natural language generation tasks. Focusing on two mechanisms for achieving user-level DP guarantees, Group Privacy and User-wise DP-SGD, we investigate design…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAccess Control and Trust · Privacy-Preserving Technologies in Data · Privacy, Security, and Data Protection