Leveraging Generative AI to Enhance Synthea Module Development

Mark A. Kramer; Aanchal Mathur; Caroline E. Adams; Jason A. Walonoski

arXiv:2507.21123·cs.AI·July 30, 2025

Leveraging Generative AI to Enhance Synthea Module Development

Mark A. Kramer, Aanchal Mathur, Caroline E. Adams, Jason A. Walonoski

PDF

TL;DR

This paper investigates how large language models can assist in developing disease modules for Synthea, aiming to reduce development time, enhance diversity, and improve data quality through iterative refinement and evaluation.

Contribution

It introduces a novel approach using LLMs for Synthea module creation, including generation, evaluation, and refinement, with a focus on progressive refinement and addressing associated challenges.

Findings

01

LLMs can generate disease profiles and modules effectively.

02

Iterative evaluation improves module accuracy and quality.

03

Human oversight remains essential for validation.

Abstract

This paper explores the use of large language models (LLMs) to assist in the development of new disease modules for Synthea, an open-source synthetic health data generator. Incorporating LLMs into the module development process has the potential to reduce development time, reduce required expertise, expand model diversity, and improve the overall quality of synthetic patient data. We demonstrate four ways that LLMs can support Synthea module creation: generating a disease profile, generating a disease module from a disease profile, evaluating an existing Synthea module, and refining an existing module. We introduce the concept of progressive refinement, which involves iteratively evaluating the LLM-generated module by checking its syntactic correctness and clinical accuracy, and then using that information to modify the module. While the use of LLMs in this context shows promise, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.