A Systematic Study of Retrieval Pipeline Design for Retrieval-Augmented Medical Question Answering

Nusrat Sultana; Abdullah Muhammad Moosa; Kazi Afzalur Rahman; Sajal Chandra Banik

arXiv:2604.07274·cs.CL·April 9, 2026

A Systematic Study of Retrieval Pipeline Design for Retrieval-Augmented Medical Question Answering

Nusrat Sultana, Abdullah Muhammad Moosa, Kazi Afzalur Rahman, Sajal Chandra Banik

PDF

TL;DR

This paper systematically evaluates retrieval-augmented medical question answering, analyzing various components and configurations, and demonstrates that effective retrieval strategies significantly enhance zero-shot performance with modest computational resources.

Contribution

It provides a comprehensive analysis of retrieval components in medical QA, identifying optimal configurations and highlighting the tradeoffs between effectiveness and computational cost.

Findings

01

Dense retrieval with query reformulation and reranking achieved 60.49% accuracy.

02

Retrieval augmentation significantly improves zero-shot medical question answering.

03

Simpler dense retrieval configurations offer strong performance with higher throughput.

Abstract

Large language models (LLMs) have demonstrated strong capabilities in medical question answering; however, purely parametric models often suffer from knowledge gaps and limited factual grounding. Retrieval-augmented generation (RAG) addresses this limitation by integrating external knowledge retrieval into the reasoning process. Despite increasing interest in RAG-based medical systems, the impact of individual retrieval components on performance remains insufficiently understood. This study presents a systematic evaluation of retrieval-augmented medical question answering using the MedQA USMLE benchmark and a structured textbook-based knowledge corpus. We analyze the interaction between language models, embedding models, retrieval strategies, query reformulation, and cross-encoder reranking within a unified experimental framework comprising forty configurations. Results show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.