Methods for Generating and Evaluating Synthetic Longitudinal Patient Data: A Systematic Review
Katariina Perkonoja, Kari Auranen, Joni Virta

TL;DR
This paper reviews methods for creating and evaluating synthetic longitudinal patient data, highlighting gaps in privacy, evaluation standards, and real-world applicability.
Contribution
The study systematically maps methods for synthetic longitudinal patient data, identifying key challenges and gaps in privacy and evaluation.
Findings
Thirty-nine methods were identified, with only four addressing all key challenges in longitudinal data generation.
Most studies evaluated resemblance and utility, but few considered privacy and all three aspects together.
No methods incorporated privacy-preserving mechanisms, and effectiveness with small sample sizes remains unclear.
Abstract
The rapid growth in data availability has facilitated research and development, yet not all industries have benefited equally due to legal and privacy constraints. The healthcare sector faces significant challenges in utilizing patient data because of concerns about data security and confidentiality. To address this, various privacy-preserving methods, including synthetic data generation, have been proposed. Synthetic data replicate existing data as closely as possible, acting as a proxy for sensitive information. While patient data are often longitudinal, this aspect remains underrepresented in existing reviews of synthetic data generation in healthcare. This paper maps and describes methods for generating and evaluating synthetic longitudinal patient data in real-life settings through a systematic literature review, conducted following the PRISMA guidelines and incorporating data from…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Electronic Health Records Systems · Chronic Disease Management Strategies
