Evaluating LLM Abilities to Understand Tabular Electronic Health   Records: A Comprehensive Study of Patient Data Extraction and Retrieval

Jesus Lovon (IRIT-IRIS); Martin Mouysset (IRIT-IRIS); Jo Oleiwan; (IRIT-IRIS); Jose G. Moreno (IRIT-IRIS); Christine Damase-Michel; Lynda; Tamine (IRIT-IRIS)

arXiv:2501.09384·cs.CL·January 17, 2025

Evaluating LLM Abilities to Understand Tabular Electronic Health Records: A Comprehensive Study of Patient Data Extraction and Retrieval

Jesus Lovon (IRIT-IRIS), Martin Mouysset (IRIT-IRIS), Jo Oleiwan, (IRIT-IRIS), Jose G. Moreno (IRIT-IRIS), Christine Damase-Michel, Lynda, Tamine (IRIT-IRIS)

PDF

1 Repo

TL;DR

This comprehensive study evaluates how well large language models understand electronic health records for patient data extraction, highlighting the importance of prompt design and demonstration selection to improve performance.

Contribution

First investigation into LLMs' ability to comprehend EHRs for patient data extraction, providing guidelines for model design in health search applications.

Findings

01

Optimal feature selection improves performance by 26.79%.

02

In-context learning with relevant examples boosts data extraction by 5.95%.

03

Guidelines proposed for designing LLM-based health search models.

Abstract

Electronic Health Record (EHR) tables pose unique challenges among which is the presence of hidden contextual dependencies between medical features with a high level of data dimensionality and sparsity. This study presents the first investigation into the abilities of LLMs to comprehend EHRs for patient data extraction and retrieval. We conduct extensive experiments using the MIMICSQL dataset to explore the impact of the prompt structure, instruction, context, and demonstration, of two backbone LLMs, Llama2 and Meditron, based on task performance. Through quantitative and qualitative analyses, our findings show that optimal feature selection and serialization methods can enhance task performance by up to 26.79% compared to naive approaches. Similarly, in-context learning setups with relevant example selection improve data extraction performance by 5.95%. Based on our study findings, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jeslev/llm-patient-ehr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFeature Selection