Methods for generating and evaluating synthetic longitudinal patient data: a systematic review
Katariina Perkonoja, Kari Auranen, Joni Virta

TL;DR
This systematic review examines methods for generating and evaluating synthetic longitudinal patient data in healthcare, highlighting gaps in privacy-preserving techniques and comprehensive assessment of data resemblance, utility, and privacy.
Contribution
It provides a comprehensive mapping of existing methods for synthetic longitudinal data generation and evaluation, identifying gaps and areas for future research.
Findings
39 methods identified for synthetic longitudinal data generation
Few methods incorporate privacy-preserving mechanisms
Most studies evaluate resemblance and utility, but privacy assessment is limited
Abstract
The rapid growth in data availability has facilitated research and development, yet not all industries have benefited equally due to legal and privacy constraints. The healthcare sector faces significant challenges in utilizing patient data because of concerns about data security and confidentiality. To address this, various privacy-preserving methods, including synthetic data generation, have been proposed. Synthetic data replicate existing data as closely as possible, acting as a proxy for sensitive information. While patient data are often longitudinal, this aspect remains underrepresented in existing reviews of synthetic data generation in healthcare. This paper maps and describes methods for generating and evaluating synthetic longitudinal patient data in real-life settings through a systematic literature review, conducted following the PRISMA guidelines and incorporating data from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
