LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs
K M Sajjadul Islam, Ayesha Siddika Nipu, Jiawei Wu, Praveen Madiraju

TL;DR
This paper presents a prompt ensemble approach using large language models like GPT-4o to improve the accuracy and reliability of medical entity recognition from unstructured EHR clinical texts, outperforming existing models.
Contribution
It introduces an ensemble prompt strategy with LLMs for medical NER, demonstrating significant performance gains and enhanced reliability over individual models.
Findings
GPT-4o with prompt ensemble achieved an F1-score of 0.95.
Ensemble method improved recall to 0.98.
Outperformed DeepSeek-R1 in medical entity recognition.
Abstract
Electronic Health Records (EHRs) are digital records of patient information, often containing unstructured clinical text. Named Entity Recognition (NER) is essential in EHRs for extracting key medical entities like problems, tests, and treatments to support downstream clinical applications. This paper explores prompt-based medical entity recognition using large language models (LLMs), specifically GPT-4o and DeepSeek-R1, guided by various prompt engineering techniques, including zero-shot, few-shot, and an ensemble approach. Among all strategies, GPT-4o with prompt ensemble achieved the highest classification performance with an F1-score of 0.95 and recall of 0.98, outperforming DeepSeek-R1 on the task. The ensemble method improved reliability by aggregating outputs through embedding-based similarity and majority voting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
