Automated HIV Screening on Dutch Electronic Health Records with Large Language Models

Lang Zhou; Amrish Jhingoer; Yinghao Luo; Klaske Vliegenthart--Jongbloed; Carlijn Jordans; Ben Werkhoven; Tom Seinen; Erik van Mulligen; Casper Rokx; Yunlei Li

arXiv:2510.19879·cs.CL·October 28, 2025

Automated HIV Screening on Dutch Electronic Health Records with Large Language Models

Lang Zhou, Amrish Jhingoer, Yinghao Luo, Klaske Vliegenthart--Jongbloed, Carlijn Jordans, Ben Werkhoven, Tom Seinen, Erik van Mulligen, Casper Rokx, Yunlei Li

PDF

TL;DR

This paper presents a novel approach using Large Language Models to analyze unstructured clinical notes in electronic health records for HIV screening, achieving high accuracy and low false negatives.

Contribution

It introduces a new pipeline leveraging LLMs to extract valuable information from unstructured EHR text for HIV risk assessment, which was previously underexplored.

Findings

01

High accuracy in HIV screening from EHR text

02

Low false negative rate in identifying at-risk patients

03

Effective use of LLMs on clinical notes

Abstract

Efficient screening and early diagnosis of HIV are critical for reducing onward transmission. Although large scale laboratory testing is not feasible, the widespread adoption of Electronic Health Records (EHRs) offers new opportunities to address this challenge. Existing research primarily focuses on applying machine learning methods to structured data, such as patient demographics, for improving HIV diagnosis. However, these approaches often overlook unstructured text data such as clinical notes, which potentially contain valuable information relevant to HIV risk. In this study, we propose a novel pipeline that leverages a Large Language Model (LLM) to analyze unstructured EHR text and determine a patient's eligibility for further HIV testing. Experimental results on clinical data from Erasmus University Medical Center Rotterdam demonstrate that our pipeline achieved high accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.