Contrastive Learning-Based privacy metrics in Tabular Synthetic Datasets
Milton Nicol\'as Plasencia Palacios, Sebastiano Saccani, Gabriele, Sgroi, Alexander Boudewijn, Luca Bortolussi

TL;DR
This paper introduces a contrastive learning-based approach to improve privacy metrics for synthetic tabular datasets, enabling more effective and intuitive privacy assessments in sensitive sectors like healthcare and finance.
Contribution
It proposes a novel contrastive embedding method that enhances privacy evaluation by unifying similarity and attack-based metrics for tabular data.
Findings
Contrastive embeddings improve privacy metric performance.
Simple metrics can match advanced GDPR-compliant methods.
The approach is efficient and easy to implement.
Abstract
Synthetic data has garnered attention as a Privacy Enhancing Technology (PET) in sectors such as healthcare and finance. When using synthetic data in practical applications, it is important to provide protection guarantees. In the literature, two family of approaches are proposed for tabular data: on the one hand, Similarity-based methods aim at finding the level of similarity between training and synthetic data. Indeed, a privacy breach can occur if the generated data is consistently too similar or even identical to the train data. On the other hand, Attack-based methods conduce deliberate attacks on synthetic datasets. The success rates of these attacks reveal how secure the synthetic datasets are. In this paper, we introduce a contrastive method that improves privacy assessment of synthetic datasets by embedding the data in a more representative space. This overcomes obstacles…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
MethodsSoftmax · Attention Is All You Need
