DP-2Stage: Adapting Language Models as Differentially Private Tabular   Data Generators

Tejumade Afonja; Hui-Po Wang; Raouf Kerkouche; Mario Fritz

arXiv:2412.02467·cs.LG·April 30, 2025

DP-2Stage: Adapting Language Models as Differentially Private Tabular Data Generators

Tejumade Afonja, Hui-Po Wang, Raouf Kerkouche, Mario Fritz

PDF

Open Access 1 Repo

TL;DR

This paper introduces DP-2Stage, a two-stage fine-tuning framework for generating synthetic tabular data with differential privacy using large language models, improving data quality under privacy constraints.

Contribution

The paper proposes a novel two-stage fine-tuning method that enhances differentially private tabular data generation with large language models.

Findings

01

DP-2Stage outperforms direct fine-tuning in DP settings.

02

Two-stage approach improves data coherence and utility.

03

Framework effectively balances privacy and data quality.

Abstract

Generating tabular data under differential privacy (DP) protection ensures theoretical privacy guarantees but poses challenges for training machine learning models, primarily due to the need to capture complex structures under noisy supervision signals. Recently, pre-trained Large Language Models (LLMs) -- even those at the scale of GPT-2 -- have demonstrated great potential in synthesizing tabular data. However, their applications under DP constraints remain largely unexplored. In this work, we address this gap by applying DP techniques to the generation of synthetic tabular data. Our findings shows that LLMs face difficulties in generating coherent text when fine-tuned with DP, as privacy budgets are inefficiently allocated to non-private elements like table structures. To overcome this, we propose DP-2Stage, a two-stage fine-tuning framework for differentially private tabular data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tejuafonja/dp-2stage
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Layer Normalization · Linear Layer · Discriminative Fine-Tuning · Weight Decay · Attention Dropout · Residual Connection · Adam · Attention Is All You Need