CEHR-GPT: Generating Electronic Health Records with Chronological Patient Timelines
Chao Pang, Xinzhuo Jiang, Nishanth Parameshwar Pavinkurve, Krishna S., Kalluri, Elise L. Minto, Jason Patterson, Linying Zhang, George Hripcsak,, Gamze G\"ursoy, No\'emie Elhadad, Karthik Natarajan

TL;DR
This paper introduces CEHR-GPT, a novel method using GPT models trained on patient timelines to generate synthetic EHR data that preserves temporal dependencies and can be formatted for OMOP standards.
Contribution
The work presents a new approach combining CEHR-BERT derived representations with GPT to generate realistic, temporally coherent synthetic EHR sequences in OMOP format.
Findings
Successfully trained GPT on patient timelines
Generated synthetic EHR data with preserved temporal dependencies
Data compatible with OMOP format
Abstract
Synthetic Electronic Health Records (EHR) have emerged as a pivotal tool in advancing healthcare applications and machine learning models, particularly for researchers without direct access to healthcare data. Although existing methods, like rule-based approaches and generative adversarial networks (GANs), generate synthetic data that resembles real-world EHR data, these methods often use a tabular format, disregarding temporal dependencies in patient histories and limiting data replication. Recently, there has been a growing interest in leveraging Generative Pre-trained Transformers (GPT) for EHR data. This enables applications like disease progression analysis, population estimation, counterfactual reasoning, and synthetic data generation. In this work, we focus on synthetic data generation and demonstrate the capability of training a GPT model using a particular patient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Electronic Health Records Systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Weight Decay · Linear Layer · Byte Pair Encoding · Discriminative Fine-Tuning · Multi-Head Attention · Attention Dropout · Residual Connection
