Leveraging electronic health records for atrial fibrillation cohort generation

Ane G. Domingo-Aldama; Marcos Merino Prado; Alain García-Olea; Josu Goikoetxea; Koldo Gojenola; Aitziber Atutxa

PMC · DOI:10.1007/s13755-025-00415-w·January 7, 2026

Leveraging electronic health records for atrial fibrillation cohort generation

Ane G. Domingo-Aldama, Marcos Merino Prado, Alain García-Olea, Josu Goikoetxea, Koldo Gojenola, Aitziber Atutxa

PDF

Open Access

TL;DR

This paper explores using AI to automate patient selection for heart disease studies, showing that language models can perform as well as traditional methods with less effort.

Contribution

The study evaluates the effectiveness of natural language processing and large language models in cohort selection for atrial fibrillation and heart failure in non-English clinical settings.

Findings

01

Discharge reports can be effectively used for automatic cohort selection with rule-based and LLM approaches.

02

LLMs performed well in temporal reasoning when explicit dates were included, though they struggled with long-context inputs.

03

Smaller general-domain models sometimes outperformed larger or medical-specific models.

Abstract

Cohort selection and eligibility screening are critical in clinical research, especially in trials where manual patient matching remains a major bottleneck. This study investigates the use of Natural Language Processing and Large Language Models (LLMs) in two real use cases, namely Atrial Fibrillation (AF) progression and Hearth Failure (HF) decompensation, within a non-English clinical context. We specifically address the following research questions: (1) Can discharge reports and NLP support cohort selection? (2) Can LLMs effectively model longitudinal patient trajectories and temporal reasoning? (3) Do general-purpose or domain-adapted LLMs outperform rule-based baselines for this task? (4) Compared to large foundation models, do small-scale LLMs offer similar performance? A dataset of 212 patients was manually annotated for AF progression using discharge reports. Two strategies…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Diseases4

Atrial Fibrillation Heart Failure HF AF

Figures15

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Topic Modeling · Atrial Fibrillation Management and Outcomes