FeatEHR-LLM: Leveraging Large Language Models for Feature Engineering in Electronic Health Records

Hojjat Karami; David Atienza; Jean-Philippe Thiran; Anisoara Ionescu

arXiv:2604.22534·cs.LG·April 27, 2026

FeatEHR-LLM: Leveraging Large Language Models for Feature Engineering in Electronic Health Records

Hojjat Karami, David Atienza, Jean-Philippe Thiran, Anisoara Ionescu

PDF

1 Repo

TL;DR

FeatEHR-LLM introduces a novel framework using Large Language Models to generate meaningful features from irregular EHR time series, improving clinical prediction performance while maintaining patient privacy.

Contribution

The paper presents a new LLM-based approach for feature engineering in EHR data that handles irregular sampling and sparsity without exposing raw patient data.

Findings

01

Achieved highest mean AUROC on 7 out of 8 clinical prediction tasks.

02

Improved prediction accuracy by up to 6 percentage points over strong baselines.

03

Supports both univariate and multivariate feature generation with validation-in-the-loop.

Abstract

Feature engineering for Electronic Health Records (EHR) is complicated by irregular observation intervals, variable measurement frequencies, and structural sparsity inherent to clinical time series. Existing automated methods either lack clinical domain awareness or assume clean, regularly sampled inputs, limiting their applicability to real-world EHR data. We present \textbf{FeatEHR-LLM}, a framework that leverages Large Language Models (LLMs) to generate clinically meaningful tabular features from irregularly sampled EHR time series. To limit patient privacy exposure, the LLM operates exclusively on dataset schemas and task descriptions rather than raw patient records. A tool-augmented generation mechanism equips the LLM with specialized routines for querying irregular temporal data, enabling it to produce executable feature-extraction code that explicitly handles uneven observation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hojjatkarami/FeatEHR-LLM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.