Leveraging electronic health records for atrial fibrillation cohort generation
Ane G. Domingo-Aldama, Marcos Merino Prado, Alain García-Olea, Josu Goikoetxea, Koldo Gojenola, Aitziber Atutxa

TL;DR
This paper explores using AI to automate patient selection for heart disease studies, showing that language models can perform as well as traditional methods with less effort.
Contribution
The study evaluates the effectiveness of natural language processing and large language models in cohort selection for atrial fibrillation and heart failure in non-English clinical settings.
Findings
Discharge reports can be effectively used for automatic cohort selection with rule-based and LLM approaches.
LLMs performed well in temporal reasoning when explicit dates were included, though they struggled with long-context inputs.
Smaller general-domain models sometimes outperformed larger or medical-specific models.
Abstract
Cohort selection and eligibility screening are critical in clinical research, especially in trials where manual patient matching remains a major bottleneck. This study investigates the use of Natural Language Processing and Large Language Models (LLMs) in two real use cases, namely Atrial Fibrillation (AF) progression and Hearth Failure (HF) decompensation, within a non-English clinical context. We specifically address the following research questions: (1) Can discharge reports and NLP support cohort selection? (2) Can LLMs effectively model longitudinal patient trajectories and temporal reasoning? (3) Do general-purpose or domain-adapted LLMs outperform rule-based baselines for this task? (4) Compared to large foundation models, do small-scale LLMs offer similar performance? A dataset of 212 patients was manually annotated for AF progression using discharge reports. Two strategies…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Topic Modeling · Atrial Fibrillation Management and Outcomes
