Agentic retrieval-augmented reasoning reshapes collective reliability under model variability in radiology question answering

Mina Farajiamiri; Jeta Sopa; Saba Afza; Lisa Adams; Felix Barajas Ordonez; Tri-Thien Nguyen; Mahshad Lotfinia; Sebastian Wind; Keno Bressem; Sven Nebelung; Daniel Truhn; Soroosh Tayebi Arasteh

arXiv:2603.06271·cs.LG·March 9, 2026

Agentic retrieval-augmented reasoning reshapes collective reliability under model variability in radiology question answering

Mina Farajiamiri, Jeta Sopa, Saba Afza, Lisa Adams, Felix Barajas Ordonez, Tri-Thien Nguyen, Mahshad Lotfinia, Sebastian Wind, Keno Bressem, Sven Nebelung, Daniel Truhn, Soroosh Tayebi Arasteh

PDF

Open Access

TL;DR

This study demonstrates that agentic retrieval-augmented reasoning improves the reliability and consensus of large language models in radiology question answering, especially under model variability, by reducing decision dispersion and increasing robustness.

Contribution

It provides empirical evidence that structured retrieval processes enhance model agreement and robustness in clinical decision support tasks, addressing reliability concerns.

Findings

01

Agentic retrieval reduces inter-model decision dispersion.

02

It increases robustness of correctness across models.

03

Consensus strength correlates with correctness, but high agreement doesn't guarantee accuracy.

Abstract

Agentic retrieval-augmented reasoning pipelines are increasingly used to structure how large language models (LLMs) incorporate external evidence in clinical decision support. These systems iteratively retrieve curated domain knowledge and synthesize it into structured reports before answer selection. Although such pipelines can improve performance, their impact on reliability under model variability remains unclear. In real-world deployment, heterogeneous models may align, diverge, or synchronize errors in ways not captured by accuracy. We evaluated 34 LLMs on 169 expert-curated publicly available radiology questions, comparing zero-shot inference with a radiology-specific multi-step agentic retrieval condition in which all models received identical structured evidence reports derived from curated radiology knowledge. Agentic inference reduced inter-model decision dispersion (median…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Multimodal Machine Learning Applications