BLUEmed: Retrieval-Augmented Multi-Agent Debate for Clinical Error Detection

Saukun Thika You; Nguyen Anh Khoa Tran; Wesley K. Marizane; Hanshu Rao; Qiunan Zhang; Xiaolei Huang

arXiv:2604.10389·cs.CL·April 14, 2026

BLUEmed: Retrieval-Augmented Multi-Agent Debate for Clinical Error Detection

Saukun Thika You, Nguyen Anh Khoa Tran, Wesley K. Marizane, Hanshu Rao, Qiunan Zhang, Xiaolei Huang

PDF

TL;DR

BLUEmed is a multi-agent debate framework with retrieval-augmented generation that improves clinical error detection in medical notes by combining evidence-grounded reasoning and multi-perspective verification.

Contribution

It introduces a novel multi-agent debate system with hybrid retrieval-augmented generation for enhanced clinical error detection, outperforming existing baselines.

Findings

01

Achieves 69.13% accuracy in clinical terminology substitution detection.

02

Outperforms single-agent RAG and debate-only baselines in experiments.

03

Retrieval augmentation and structured debate are complementary, improving detection performance.

Abstract

Terminology substitution errors in clinical notes, where one medical term is replaced by a linguistically valid but clinically different term, pose a persistent challenge for automated error detection in healthcare. We introduce BLUEmed, a multi-agent debate framework augmented with hybrid Retrieval-Augmented Generation (RAG) that combines evidence-grounded reasoning with multi-perspective verification for clinical error detection. BLUEmed decomposes each clinical note into focused sub-queries, retrieves source-partitioned evidence through dense, sparse, and online retrieval, and assigns two domain expert agents distinct knowledge bases to produce independent analyses; when the experts disagree, a structured counter-argumentation round and cross-source adjudication resolve the conflict, followed by a cascading safety layer that filters common false-positive patterns. We evaluate BLUEmed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.