Synthcity: facilitating innovative use cases of synthetic data in   different data modalities

Zhaozhi Qian; Bogdan-Constantin Cebere; Mihaela van der Schaar

arXiv:2301.07573·cs.LG·January 19, 2023·25 cites

Synthcity: facilitating innovative use cases of synthetic data in different data modalities

Zhaozhi Qian, Bogdan-Constantin Cebere, Mihaela van der Schaar

PDF

Open Access 2 Repos

TL;DR

Synthcity is an open-source toolkit that enables innovative applications of synthetic data across various data types, supporting research, experimentation, and benchmarking in ML fairness, privacy, and augmentation.

Contribution

It provides a unified platform for synthetic data generation and benchmarking across diverse data modalities, facilitating research and practical applications.

Findings

01

Supports multiple data modalities including time series and censored data

02

Offers state-of-the-art benchmarks for synthetic data quality

03

Enables rapid experimentation and community collaboration

Abstract

Synthcity is an open-source software package for innovative use cases of synthetic data in ML fairness, privacy and augmentation across diverse tabular data modalities, including static data, regular and irregular time series, data with censoring, multi-source data, composite data, and more. Synthcity provides the practitioners with a single access point to cutting edge research and tools in synthetic data. It also offers the community a playground for rapid experimentation and prototyping, a one-stop-shop for SOTA benchmarks, and an opportunity for extending research impact. The library can be accessed on GitHub (https://github.com/vanderschaarlab/synthcity) and pip (https://pypi.org/project/synthcity/). We warmly invite the community to join the development effort by providing feedback, reporting bugs, and contributing code.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Privacy-Preserving Technologies in Data · Data Quality and Management

MethodsLib