The challenge of generating and evolving real-life like synthetic test data without accessing real-world raw data -- a Systematic Review
Maj-Annika Tammisto, Faiz Ali Shah, Daniel Rodriguez, Dietmar Pfahl

TL;DR
This systematic review examines methods for creating and evolving synthetic test data that preserve privacy without using real raw data, highlighting gaps in current approaches especially in data evolution.
Contribution
It synthesizes existing approaches for privacy-preserving synthetic data generation and identifies the lack of focus on data evolution in current research.
Findings
37 approaches partially address the research question
Most methods require access to real data for anonymization
Data evolution in synthetic datasets is underexplored
Abstract
Background: High-level system testing of applications that use data from e-Government services as input requires test data that is real-life-like but where the privacy of personal information is guaranteed. Applications with such strong requirement include information exchange between countries, medicine, banking, etc. This review aims to synthesize the current state-of-the-practice in this domain. Objectives: The objective of this Systematic Review is to identify existing approaches for creating and evolving synthetic test data without using real-life raw data. Methods: We followed well-known methodologies for conducting systematic literature reviews, including the ones from Kitchenham as well as guidelines for analysing the limitations of our review and its threats to validity. Results: A variety of methods and tools exist for creating privacy-preserving test data. Our search…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Big Data and Digital Economy
