Differentially Private Synthetic Data via Foundation Model APIs 2: Text
Chulin Xie, Zinan Lin, Arturs Backurs, Sivakanth Gopi, Da Yu, Huseyin, A Inan, Harsha Nori, Haotian Jiang, Huishuai Zhang, Yin Tat Lee, Bo Li,, Sergey Yekhanin

TL;DR
This paper introduces Aug-PE, a method that generates differentially private synthetic text using only API access to large language models, avoiding costly model fine-tuning and enabling privacy-preserving NLP applications.
Contribution
Aug-PE extends the Private Evolution algorithm to text data, allowing DP synthetic text generation solely via API access without model training, achieving state-of-the-art utility.
Findings
Aug-PE produces high-quality DP synthetic text comparable to finetuning-based methods.
The method works effectively across multiple benchmark datasets.
It enables privacy-preserving text generation without requiring access to or training of large models.
Abstract
Text data has become extremely valuable due to the emergence of machine learning algorithms that learn from it. A lot of high-quality text data generated in the real world is private and therefore cannot be shared or used freely due to privacy concerns. Generating synthetic replicas of private text data with a formal privacy guarantee, i.e., differential privacy (DP), offers a promising and scalable solution. However, existing methods necessitate DP finetuning of large language models (LLMs) on private data to generate DP synthetic data. This approach is not viable for proprietary LLMs (e.g., GPT-3.5) and also demands considerable computational resources for open-source LLMs. Lin et al. (2024) recently introduced the Private Evolution (PE) algorithm to generate DP synthetic images with only API access to diffusion models. In this work, we propose an augmented PE algorithm, named Aug-PE,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAccess Control and Trust · Advanced Database Systems and Queries · Scientific Computing and Data Management
Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Byte Pair Encoding · Dropout · Multi-Head Attention
