PaReGTA: An LLM-based EHR Data Encoding Approach to Capture Temporal Information
Kihyuk Yoon, Lingchao Mao, Catherine Chong, Todd J. Schwedt, Chia-Chun Chiang, Jing Li

TL;DR
PaReGTA is a novel LLM-based framework for encoding longitudinal EHR data that captures temporal information effectively, improving classification performance in data-limited settings without extensive training.
Contribution
It introduces a new encoding method combining templated text, contrastive fine-tuning, and temporal pooling, enhancing interpretability and performance over traditional methods.
Findings
Outperforms sparse baselines in migraine classification
More stable than deep sequential models in limited data scenarios
Utilizes pre-trained LLMs for effective encoding
Abstract
Temporal information in structured electronic health records (EHRs) is often lost in sparse one-hot or count-based representations, while sequence models can be costly and data-hungry. We propose PaReGTA, an LLM-based encoding framework that (i) converts longitudinal EHR events into visit-level templated text with explicit temporal cues, (ii) learns domain-adapted visit embeddings via lightweight contrastive fine-tuning of a sentence-embedding model, and (iii) aggregates visit embeddings into a fixed-dimensional patient representation using hybrid temporal pooling that captures both recency and globally informative visits. Because PaReGTA does not require training from scratch but instead utilizes a pre-trained LLM, it can perform well even in data-limited cohorts. Furthermore, PaReGTA is model-agnostic and can benefit from future EHR-specialized sentence-embedding models. For…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Electronic Health Records Systems · Biomedical Text Mining and Ontologies
