Private Text Generation by Seeding Large Language Model Prompts

Supriya Nagesh; Justin Y. Chen; Nina Mishra; Tal Wagner

arXiv:2502.13193·cs.CL·February 20, 2025

Private Text Generation by Seeding Large Language Model Prompts

Supriya Nagesh, Justin Y. Chen, Nina Mishra, Tal Wagner

PDF

Open Access

TL;DR

This paper introduces DP-KPS, a method for generating private synthetic text from sensitive data using large language models with differentially private prompts, enabling privacy-preserving data sharing for machine learning.

Contribution

The paper presents DP-KPS, a novel prompt-based approach that achieves differential privacy in synthetic text generation without fine-tuning or training models.

Findings

01

Synthetic corpora retain predictive power for ML tasks

02

DP-KPS effectively balances privacy and diversity

03

Method requires minimal compute and no model training

Abstract

We explore how private synthetic text can be generated by suitably prompting a large language model (LLM). This addresses a challenge for organizations like hospitals, which hold sensitive text data like patient medical records, and wish to share it in order to train machine learning models for medical tasks, while preserving patient privacy. Methods that rely on training or finetuning a model may be out of reach, either due to API limits of third-party LLMs, or due to ethical and legal prohibitions on sharing the private data with the LLM itself. We propose Differentially Private Keyphrase Prompt Seeding (DP-KPS), a method that generates a private synthetic text corpus from a sensitive input corpus, by accessing an LLM only through privatized prompts. It is based on seeding the prompts with private samples from a distribution over phrase embeddings, thus capturing the input corpus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Privacy-Preserving Technologies in Data