Retrieval-Augmented LLMs for Evidence Localization in Clinical Trial Recruitment from Longitudinal EHR Narratives

Ziyi Chen; Mengxian Lyu; Cheng Peng; Yonghui Wu

arXiv:2604.05190·cs.CL·April 30, 2026

Retrieval-Augmented LLMs for Evidence Localization in Clinical Trial Recruitment from Longitudinal EHR Narratives

Ziyi Chen, Mengxian Lyu, Cheng Peng, Yonghui Wu

PDF

TL;DR

This study evaluates various large language models and strategies for improving clinical trial patient screening from long electronic health record narratives, demonstrating that generative LLMs with retrieval-augmented strategies outperform other methods.

Contribution

It systematically compares encoder- and decoder-based LLMs and introduces strategies to handle long documents, achieving state-of-the-art results on a clinical trial screening benchmark.

Findings

01

MedGemma with RAG achieved 89.05% micro-F1 score.

02

Generative LLMs excel in long-term reasoning across lengthy documents.

03

Specific criteria are needed to select optimal LLM strategies for real-world adoption.

Abstract

Screening patients for enrollment is a well-known, labor-intensive bottleneck that leads to under-enrollment and, ultimately, trial failures. Recent breakthroughs in large language models (LLMs) offer a promising opportunity to use artificial intelligence to improve screening. This study systematically explored both encoder- and decoder-based generative LLMs for screening clinical narratives to facilitate clinical trial recruitment. We examined both general-purpose LLMs and medical-adapted LLMs and explored three strategies to alleviate the "Lost in the Middle" issue when handling long documents, including 1) Original long-context: using the default context windows of LLMs, 2) NER-based extractive summarization: converting the long document into summarizations using named entity recognition, 3) RAG: dynamic evidence retrieval based on eligibility criteria. The 2018 N2C2 Track 1…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.