Memisis: Orchestrating and Evaluating Synthetic Data for Tabular Health Datasets
Nitish Nagesh, Mahdi Bagheri, Arshia Harish Puthran, Pengbao Zhou, Muhjaazee Love, Aadi Sharma, Ian Harris, Amir M. Rahmani

TL;DR
Memisis is a comprehensive tool that orchestrates and evaluates synthetic healthcare data, balancing privacy, utility, and fairness to support clinical decision-making and research.
Contribution
It introduces an interactive workflow integrating existing synthetic data tools, large language models, and evaluation metrics for healthcare datasets.
Findings
CTGAN, TVAE, and GaussianCopula perform similarly across fairness and utility metrics.
The workflow provides users with flexible control over data generation and evaluation.
Demonstrated on an open-source schizophrenia dataset with protected attributes.
Abstract
Synthetic data is widely used in healthcare to create datasets that are similar to original data but without the privacy concerns. Generating and evaluating synthetic data across privacy, utility and fairness is crucial for facilitating high quality data availability for downstream prediction tasks and clinical decision making. We present Memisis, a tool that orchestrates and evaluates synthetic data by leveraging existing synthetic data tools, the power of large language models and state-of-the-art evaluation metrics. Our tool creates a unified workflow for data generation, validation and evaluation. Users have control over the training size, training epochs and the number of synthetic rows to sample. Instead of knobs to tune synthetic data, the interactive agent allows users to specify their synthetic data generation goals and the tool will orchestrate the workflow by leveraging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
