When Evidence Conflicts: Uncertainty and Order Effects in Retrieval-Augmented Biomedical Question Answering

Yikun Han; Mengfei Lan; Halil Kilicoglu

arXiv:2605.14115·cs.CL·May 15, 2026

When Evidence Conflicts: Uncertainty and Order Effects in Retrieval-Augmented Biomedical Question Answering

Yikun Han, Mengfei Lan, Halil Kilicoglu

PDF

TL;DR

This paper evaluates biomedical retrieval-augmented LLMs under conflicting evidence scenarios, revealing accuracy drops and proposing a conflict-aware abstention method to improve reliability.

Contribution

It introduces a systematic evaluation of LLMs with conflicting biomedical evidence and proposes an abstention score that enhances decision reliability.

Findings

01

Accuracy drops when evidence order is reversed.

02

A conflict-aware abstention score improves selective accuracy.

03

Conflicting evidence impacts both uncertainty and robustness.

Abstract

Biomedical retrieval-augmented large language models (LLMs) often face evidence that is incomplete, misleading, or internally contradictory, yet evaluation usually emphasizes answer accuracy under helpful context rather than reliability under conflict. Using HealthContradict, we evaluate six open-weight LLMs under five controlled evidence conditions: no retrieved context, correct-only context, incorrect-only context, and two mixed conditions containing both correct and contradictory documents in opposite orders. In this conflicting-evidence order contrast, where the same two documents are both present and only their order is reversed, accuracy drops for every model and 11.4%--25.2% of predictions flip. To support abstention in these difficult cases, we also evaluate a conflict-aware abstention score that combines model confidence with a detector of evidence conflict. In the two hardest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.