SynthGuard: Redefining Synthetic Data Generation with a Scalable and Privacy-Preserving Workflow Framework
Eduardo Brito, Mahmoud Shoush, Kristian Tamm, Paula Etti, Liina Kamm

TL;DR
SynthGuard is a framework that enables secure, privacy-preserving, and scalable synthetic data generation workflows, giving data owners control and ensuring compliance across diverse environments.
Contribution
It introduces a modular, privacy-preserving workflow framework for synthetic data generation that maintains data sovereignty and regulatory compliance.
Findings
Effective in balancing security, privacy, and scalability.
Supports auditable and reproducible SDG workflows.
Validated with real-world use cases.
Abstract
The growing reliance on data-driven applications in sectors such as healthcare, finance, and law enforcement underscores the need for secure, privacy-preserving, and scalable mechanisms for data generation and sharing. Synthetic data generation (SDG) has emerged as a promising approach but often relies on centralized or external processing, raising concerns about data sovereignty, domain ownership, and compliance with evolving regulatory standards. To overcome these issues, we introduce SynthGuard, a framework designed to ensure computational governance by enabling data owners to maintain control over SDG workflows. SynthGuard supports modular and privacy-preserving workflows, ensuring secure, auditable, and reproducible execution across diverse environments. In this paper, we demonstrate how SynthGuard addresses the complexities at the intersection of domain-specific needs and scalable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
