Benchmarking Retrieval Strategies for Biomedical Retrieval-Augmented Generation: A Controlled Empirical Study
Devi Prasad Bal, Subhashree Puhan

TL;DR
This study systematically compares five retrieval strategies in biomedical RAG pipelines, demonstrating that query-document interaction improves retrieval quality and that retrieval significantly enhances answer relevancy.
Contribution
It provides a controlled empirical evaluation of retrieval strategies in biomedical RAG, highlighting the effectiveness of cross-encoder reranking and the impact of retrieval on answer relevancy.
Findings
Cross-encoder reranking achieves the highest composite score and contextual precision.
Naive multi-query expansion introduces retrieval noise and reduces precision.
All retrieval strategies outperform no-context baselines in answer relevancy.
Abstract
Retrieval-Augmented Generation (RAG) offers a well-established path to grounding large language model (LLM) outputs in external knowledge, yet the question of which retrieval strategy works best in a high-stakes domain such as biomedicine has not received the controlled, multi-metric treatment it deserves. This paper presents a systematic empirical comparison of five retrieval strategies -- Dense Vector Search, Hybrid BM25 + Dense retrieval, Cross-Encoder Reranking, Multi-Query Expansion, and Maximal Marginal Relevance (MMR) -- within a biomedical question-answering RAG pipeline. All strategies share a fixed generation model (GPT-4o-mini), a common vector store (ChromaDB), and OpenAI's text-embedding-3-small embeddings, ensuring that observed differences are attributable to retrieval alone. Evaluation is conducted on 250 question-answer pairs drawn from a preprocessed subset of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
