Risk-Equalized Differentially Private Synthetic Data: Protecting Outliers by Controlling Record-Level Influence

Amir Asiaee; Chao Yan; Zachary B. Abrams; Bradley A. Malin

arXiv:2602.10232·cs.LG·February 12, 2026

Risk-Equalized Differentially Private Synthetic Data: Protecting Outliers by Controlling Record-Level Influence

Amir Asiaee, Chao Yan, Zachary B. Abrams, Bradley A. Malin

PDF

Open Access

TL;DR

This paper proposes a risk-equalized differential privacy framework that reduces the influence of outliers during synthetic data generation, improving privacy protection for high-risk records.

Contribution

It introduces a two-stage method that estimates record outlierness and applies inverse weighting during DP synthesis, providing tighter privacy guarantees for sensitive outliers.

Findings

01

Reduces membership inference success on high-risk records.

02

Effectiveness depends on the quality of outlier scoring.

03

Demonstrates benefits on simulated and real-world datasets.

Abstract

When synthetic data is released, some individuals are harder to protect than others. A patient with a rare disease combination or a transaction with unusual characteristics stands out from the crowd. Differential privacy provides worst-case guarantees, but empirical attacks -- particularly membership inference -- succeed far more often against such outliers, especially under moderate privacy budgets and with auxiliary information. This paper introduces risk-equalized DP synthesis, a framework that prioritizes protection for high-risk records by reducing their influence on the learned generator. The mechanism operates in two stages: first, a small privacy budget estimates each record's "outlierness"; second, a DP learning procedure weights each record inversely to its risk score. Under Gaussian mechanisms, a record's privacy loss is proportional to its influence on the output -- so…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Imbalanced Data Classification Techniques · Machine Learning in Healthcare