Generating Clinically Realistic EHR Data via a Hierarchy- and Semantics-Guided Transformer
Guanglin Zhou, Sebastiano Barbieri

TL;DR
This paper introduces HiSGT, a novel transformer-based framework that generates highly realistic synthetic EHR data by incorporating hierarchical and semantic information of clinical codes, improving data fidelity and utility.
Contribution
The paper proposes a hierarchical and semantics-guided transformer model that leverages clinical code relationships and descriptions for more accurate EHR data generation.
Findings
HiSGT outperforms baseline models in statistical similarity to real EHRs.
Synthetic data generated by HiSGT enhances downstream disease classification.
The approach improves the clinical fidelity of synthetic electronic health records.
Abstract
Generating realistic synthetic electronic health records (EHRs) holds tremendous promise for accelerating healthcare research, facilitating AI model development and enhancing patient privacy. However, existing generative methods typically treat EHRs as flat sequences of discrete medical codes. This approach overlooks two critical aspects: the inherent hierarchical organization of clinical coding systems and the rich semantic context provided by code descriptions. Consequently, synthetic patient sequences often lack high clinical fidelity and have limited utility in downstream clinical tasks. In this paper, we propose the Hierarchy- and Semantics-Guided Transformer (HiSGT), a novel framework that leverages both hierarchical and semantic information for the generative process. HiSGT constructs a hierarchical graph to encode parent-child and sibling relationships among clinical codes and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
