Synthcity: facilitating innovative use cases of synthetic data in different data modalities
Zhaozhi Qian, Bogdan-Constantin Cebere, Mihaela van der Schaar

TL;DR
Synthcity is an open-source toolkit that enables innovative applications of synthetic data across various data types, supporting research, experimentation, and benchmarking in ML fairness, privacy, and augmentation.
Contribution
It provides a unified platform for synthetic data generation and benchmarking across diverse data modalities, facilitating research and practical applications.
Findings
Supports multiple data modalities including time series and censored data
Offers state-of-the-art benchmarks for synthetic data quality
Enables rapid experimentation and community collaboration
Abstract
Synthcity is an open-source software package for innovative use cases of synthetic data in ML fairness, privacy and augmentation across diverse tabular data modalities, including static data, regular and irregular time series, data with censoring, multi-source data, composite data, and more. Synthcity provides the practitioners with a single access point to cutting edge research and tools in synthetic data. It also offers the community a playground for rapid experimentation and prototyping, a one-stop-shop for SOTA benchmarks, and an opportunity for extending research impact. The library can be accessed on GitHub (https://github.com/vanderschaarlab/synthcity) and pip (https://pypi.org/project/synthcity/). We warmly invite the community to join the development effort by providing feedback, reporting bugs, and contributing code.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Privacy-Preserving Technologies in Data · Data Quality and Management
MethodsLib
