DP-RFT: Learning to Generate Synthetic Text via Differentially Private Reinforcement Fine-Tuning

Fangyuan Xu; Sihao Chen; Zinan Lin; Taiwei Shi; Sydney Graham; Pei Zhou; Mengting Wan; Alex Stein; Virginia Estellers; Charles Chen; Morris Sharp; Richard Speyer; Tadas Baltrusaitis; Jennifer Neville; Eunsol Choi; Longqi Yang

arXiv:2602.18633·cs.CL·February 24, 2026

DP-RFT: Learning to Generate Synthetic Text via Differentially Private Reinforcement Fine-Tuning

Fangyuan Xu, Sihao Chen, Zinan Lin, Taiwei Shi, Sydney Graham, Pei Zhou, Mengting Wan, Alex Stein, Virginia Estellers, Charles Chen, Morris Sharp, Richard Speyer, Tadas Baltrusaitis, Jennifer Neville, Eunsol Choi, Longqi Yang

PDF

Open Access 3 Reviews

TL;DR

DP-RFT introduces a reinforcement learning approach that enables large language models to generate high-quality synthetic text with formal privacy guarantees, without direct access to private data, improving fidelity and utility.

Contribution

We propose DP-RFT, a novel online reinforcement learning method that trains LLMs to generate private data without eyes-on access, using DP-protected neighbor votes as rewards.

Findings

01

DP-RFT achieves higher fidelity in synthetic data compared to un-finetuned models.

02

The method maintains privacy while improving downstream utility.

03

It effectively generates domain-specific long-form text like news and medical abstracts.

Abstract

Differentially private (DP) synthetic data generation plays a pivotal role in developing large language models (LLMs) on private data, where data owners cannot provide eyes-on access to individual examples. Generating DP synthetic data typically involves a difficult trade-off. On one hand, DP finetuning methods train an LLM as a synthetic data generator with formal privacy guarantees, yet it still requires the raw content of private examples for model training. However, methods that avoid direct exposure to private data are bounded by an off-the-shelf, un-finetuned model, whose outputs often lack domain fidelity. Can we train an LLM to generate high-quality synthetic text without eyes-on access to individual private examples? In this work, we introduce Differentially Private Reinforcement Fine-Tuning (DP-RFT), an online reinforcement learning algorithm for synthetic data generation with…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 8Confidence 4

Strengths

- Experimental setups are solid, and results are convincing. Using the same generation model, embedder, and initial prompt for DP-RFT and AUG-PE, DP-RFT outperforms AUG-PE and also the baseline of just using the initial prompt with the baseline model. - The method has numerous advantages. In addition to the fact that private data is not ever directly ingested by the model (which mitigates privacy risks from possible DP implementation gaps); the method is very simple; furthermore it can be imple

Weaknesses

- In Table 1, the private finetuning baseline has a mismatched number of samples. Although the number of samples (2k) is the same for training, AUG-PE and DP-RFT use the full dataset and associated N for noise calculation for generating the 2k, while private finetuning only gets 2k for such. - There does seem to be a certain amount of prompt engineering and reward function crafting required to get things to work, but it is the same case in private evolution. - No results for privately finetuni

Reviewer 02Rating 4Confidence 3

Strengths

S1. DP-RFT integrates differentially private reinforcement fine-tuning (RFT) to train large language models (LLMs) for synthetic data generation. I like its "eyes-off" approach, where the LLM does not directly ingest private examples during training, addressing a significant practical challenge in privacy-sensitive applications. S2. DP-RFT attempts to bridge the gap between methods offering formal privacy guarantees (like DP-SGD, which requires direct data access) and methods avoiding direct da

Weaknesses

W1. While DP-RFT generally shows better mean/max embedding similarity, its performance on FID is comparable to or worse than Aug-PE, especially for PubMed and Wildchat. I am concerned about the overall fidelity of the synthetic data. For instance, for Wildchat (ε=∞), DP-RFT has a much higher FID (0.74) compared to Aug-PE (0.39) and even the private data (0.07). W2. The impact of R_prompt on preventing reward hacking is not empirically demonstrated or thoroughly analyzed in the main results. Th

Reviewer 03Rating 2Confidence 4

Strengths

- The problem is well-motivated and important for the community. - The paper is well-written and is easy to follow.

Weaknesses

- The most important way to evaluate the performance of the proposed method is to show that it provides a better "performance" compared to (1) existing eyes-off methods and (2) methods that require direct access to raw private data during training. However, in line 244 it is mentioned that the performance evaluation is done for fine-tuning BERT_small, while the synthetic data is generated with Qwen with the help of GPT-4o (line 256)? If so, I'm very surprised of this evaluation and don't see the

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Topic Modeling · Machine Learning and Algorithms