A Systematic Analysis of Large Language Models with RAG-enabled Dynamic Prompting for Medical Error Detection and Correction

Farzad Ahmed; Joniel Augustine Jerome; Meliha Yetisgen; \"Ozlem Uzuner

arXiv:2511.19858·cs.CL·November 27, 2025

A Systematic Analysis of Large Language Models with RAG-enabled Dynamic Prompting for Medical Error Detection and Correction

Farzad Ahmed, Joniel Augustine Jerome, Meliha Yetisgen, \"Ozlem Uzuner

PDF

Open Access

TL;DR

This paper systematically evaluates different prompting strategies for large language models in detecting and correcting medical errors, demonstrating that retrieval-augmented dynamic prompting significantly improves performance and reliability.

Contribution

It introduces and empirically validates retrieval-augmented dynamic prompting as a superior method for medical error detection and correction with LLMs.

Findings

01

RDP reduces false positives by about 15%

02

RDP improves recall in error sentence detection by 5-10%

03

RDP generates more contextually accurate corrections

Abstract

Objective: Clinical documentation contains factual, diagnostic, and management errors that can compromise patient safety. Large language models (LLMs) may help detect and correct such errors, but their behavior under different prompting strategies remains unclear. We evaluate zero-shot prompting, static prompting with random exemplars (SPR), and retrieval-augmented dynamic prompting (RDP) for three subtasks of medical error processing: error flag detection, error sentence detection, and error correction. Methods: Using the MEDEC dataset, we evaluated nine instruction-tuned LLMs (GPT, Claude, Gemini, and OpenAI o-series models). We measured performance using accuracy, recall, false-positive rate (FPR), and an aggregate score of ROUGE-1, BLEURT, and BERTScore for error correction. We also analyzed example outputs to identify failure modes and differences between LLM and clinician…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Machine Learning in Healthcare