Evaluation metrics for temporal preservation in synthetic longitudinal patient data
Katariina Perkonoja, Parisa Movahedi, Antti Airola, Kari Auranen, Joni Virta

TL;DR
This paper proposes a comprehensive set of metrics to evaluate how well synthetic longitudinal patient data preserve temporal structures, addressing limitations of single-metric assessments and guiding improvements in generative models.
Contribution
It introduces multidimensional metrics for assessing temporal preservation in synthetic longitudinal data, highlighting factors affecting data quality and limitations of existing evaluation methods.
Findings
Strong marginal resemblance can hide covariance distortions
Measurement frequency and preprocessing impact temporal fidelity
No single metric suffices; a multidimensional approach is necessary
Abstract
This study introduces a set of metrics for evaluating temporal preservation in synthetic longitudinal patient data, defined as artificially generated data that mimic real patients' repeated measurements over time. The proposed metrics assess how synthetic data reproduces key temporal characteristics, categorized into marginal, covariance, individual-level and measurement structures. We show that strong marginal-level resemblance may conceal distortions in covariance and disruptions in individual-level trajectories. Temporal preservation is influenced by factors such as original data quality, measurement frequency, and preprocessing strategies, including binning, variable encoding and precision. Variables with sparse or highly irregular measurement times provide limited information for learning temporal dependencies, resulting in reduced resemblance between the synthetic and original…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Electronic Health Records Systems · Generative Adversarial Networks and Image Synthesis
