# Who’s afraid of synthetic data? Hybrid approaches to deliver medical digital twins

**Authors:** Joel Vanin, Amit Hagar, James A. Glazier

PMC · DOI: 10.1016/j.imu.2026.101737 · Informatics in medicine unlocked · 2026-04-02

## TL;DR

This paper proposes using hybrid systems combining synthetic data and AI to create medical digital twins that can address data limitations in precision medicine.

## Contribution

The paper introduces a novel conceptual framework for hybrid systems integrating mechanistic models, synthetic data, and AI in medical digital twins.

## Key findings

- Hybrid systems can generate biologically constrained synthetic cohorts to overcome data scarcity in precision medicine.
- Population-level digital twins are more feasible in the near term for decision support and patient stratification.
- A four-layer governance framework is proposed to manage risks like bias and drift in hybrid digital twin systems.

## Abstract

Despite rapidly growing volumes of clinical data, precision medicine still faces a structural data deficit: most patients and rare disease variants are sparsely sampled, labels are noisy, and counterfactual outcomes for alternative treatments are fundamentally unobservable. This position paper argues that overcoming these limits will require hybrid systems that couple multiscale virtual tissue models, synthetic data generation, and AI/ML within risk-aware digital twin frameworks. Using a structured narrative synthesis of three literatures—synthetic health data, virtual tissues and medical digital twins, and hybrid mechanistic–AI architectures including numerical weather prediction—we develop a conceptual framework centered on a mechanistic core linked to AI via forward (mechanistic → synthetic data → AI), backward (AI → mechanistic), and closed (patient-anchored digital twin) loops. We analyze how complex-systems behavior, biological adaptability, and sparse observations bound what medical digital twins can meaningfully predict, motivating ensemble and population-level forecasts rather than exact individual replicas. We then survey emerging implementation patterns, parameter-space exploration methods, and computational envelopes for using virtual tissues to generate biologically constrained synthetic cohorts and to calibrate hybrid digital twins. Finally, we adapt risk- and context-informed verification, validation, and governance frameworks to a four-layer stack spanning mechanistic cores, synthetic data products, AI components, and clinical workflows, with explicit attention to bias, drift, and provenance. We conclude that near-term impact is most likely from population- and cohort-level digital twins that support stratification and short-horizon decision support, while laying the groundwork for more individualized, trustworthy hybrids as biological and methodological uncertainties are better characterized.

## Full-text entities

- **Diseases:** Tumor (MESH:D009369), ML (MESH:C537366), AI (MESH:C538142), GSP (MESH:C565484), MIDD (MESH:D002658)
- **Chemicals:** BioRender (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13041779/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13041779/full.md

## References

128 references — full list in the complete paper: https://tomesphere.com/paper/PMC13041779/full.md

---
Source: https://tomesphere.com/paper/PMC13041779