Generate synthetic samples from tabular data

David Banh; Alan Huang

arXiv:2209.06113·cs.LG·December 26, 2022

Generate synthetic samples from tabular data

David Banh, Alan Huang

PDF

Open Access 2 Repos

TL;DR

This paper discusses a method for generating synthetic tabular data to address privacy concerns, improve data sharing, and reduce costs associated with data collection and invasive procedures.

Contribution

It introduces a novel approach for creating statistically robust synthetic samples from tabular data to enhance privacy and data sharing practices.

Findings

01

Synthetic samples improve privacy preservation.

02

Method reduces costs of data collection.

03

Enhances data sharing without compromising privacy.

Abstract

Generating new samples from data sets can mitigate extra expensive operations, increased invasive procedures, and mitigate privacy issues. These novel samples that are statistically robust can be used as a temporary and intermediate replacement when privacy is a concern. This method can enable better data sharing practices without problems relating to identification issues or biases that are flaws for an adversarial attack.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Data Quality and Management