Privately Fine-Tuning Large Language Models with Differential Privacy

Rouzbeh Behnia; Mohamamdreza Ebrahimi; Jason Pacheco; Balaji; Padmanabhan

arXiv:2210.15042·cs.CR·May 2, 2023

Privately Fine-Tuning Large Language Models with Differential Privacy

Rouzbeh Behnia, Mohamamdreza Ebrahimi, Jason Pacheco, Balaji, Padmanabhan

PDF

Open Access

TL;DR

This paper introduces extit{EWTune}, a differential privacy framework for fine-tuning large language models that balances privacy guarantees with improved performance, addressing privacy risks while maintaining model utility.

Contribution

The paper presents extit{EWTune}, a novel DP fine-tuning method based on Edgeworth accountant with finite-sample guarantees, reducing noise and enhancing LLM performance.

Findings

01

Reduces privacy-induced noise by up to 5.6%.

02

Improves LLM performance by up to 1.1% on NLU tasks.

03

Provides open-source implementation for community use.

Abstract

Pre-trained Large Language Models (LLMs) are an integral part of modern AI that have led to breakthrough performances in complex AI tasks. Major AI companies with expensive infrastructures are able to develop and train these large models with billions and millions of parameters from scratch. Third parties, researchers, and practitioners are increasingly adopting these pre-trained models and fine-tuning them on their private data to accomplish their downstream AI tasks. However, it has been shown that an adversary can extract/reconstruct the exact training samples from these LLMs, which can lead to revealing personally identifiable information. The issue has raised deep concerns about the privacy of LLMs. Differential privacy (DP) provides a rigorous framework that allows adding noise in the process of training or fine-tuning LLMs such that extracting the training data becomes infeasible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data