Private prediction for large-scale synthetic text generation

Kareem Amin; Alex Bie; Weiwei Kong; Alexey Kurakin; Natalia; Ponomareva; Umar Syed; Andreas Terzis; Sergei Vassilvitskii

arXiv:2407.12108·cs.LG·October 10, 2024·1 cites

Private prediction for large-scale synthetic text generation

Kareem Amin, Alex Bie, Weiwei Kong, Alexey Kurakin, Natalia, Ponomareva, Umar Syed, Andreas Terzis, Sergei Vassilvitskii

PDF

Open Access

TL;DR

This paper introduces a method for generating large-scale, high-quality synthetic text with differential privacy guarantees by improving privacy analysis, selection mechanisms, and leveraging public predictions, expanding potential applications.

Contribution

It presents novel techniques for differentially private text generation that scale to thousands of data points, surpassing previous small-scale approaches.

Findings

01

Generated thousands of synthetic data points with privacy guarantees

02

Enhanced privacy analysis and selection mechanisms

03

Effective use of public predictions for structured data

Abstract

We present an approach for generating differentially private synthetic text using large language models (LLMs), via private prediction. In the private prediction framework, we only require the output synthetic data to satisfy differential privacy guarantees. This is in contrast to approaches that train a generative model on potentially sensitive user-supplied source data and seek to ensure the model itself is safe to release. We prompt a pretrained LLM with source data, but ensure that next-token predictions are made with differential privacy guarantees. Previous work in this paradigm reported generating a small number of examples (<10) at reasonable privacy levels, an amount of data that is useful only for downstream in-context learning or prompting. In contrast, we make changes that allow us to generate thousands of high-quality synthetic data points, greatly expanding the set of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques

MethodsSparse Evolutionary Training · Softmax