Efficient and Reproducible Biomedical Question Answering using Retrieval Augmented Generation

Linus Stuhlmann; Michael Alexander Saxer; Jonathan F\"urst

arXiv:2505.07917·cs.IR·January 14, 2026

Efficient and Reproducible Biomedical Question Answering using Retrieval Augmented Generation

Linus Stuhlmann, Michael Alexander Saxer, Jonathan F\"urst

PDF

1 Repo 2 Datasets

TL;DR

This paper evaluates retrieval-augmented generation methods for biomedical question answering, optimizing retrieval strategies and response times on large PubMed datasets to improve accuracy, efficiency, and scalability.

Contribution

It systematically compares retrieval strategies and demonstrates an optimal balance between accuracy and response time in biomedical QA using RAG systems.

Findings

01

Retrieving 50 documents with BM25 and reranking with MedCPT balances accuracy and speed.

02

BM25 retrieval remains fast at 82ms, while MedCPT adds computational cost.

03

The study provides insights into retrieval depth, efficiency, and scalability trade-offs.

Abstract

Biomedical question-answering (QA) systems require effective retrieval and generation components to ensure accuracy, efficiency, and scalability. This study systematically examines a Retrieval-Augmented Generation (RAG) system for biomedical QA, evaluating retrieval strategies and response time trade-offs. We first assess state-of-the-art retrieval methods, including BM25, BioBERT, MedCPT, and a hybrid approach, alongside common data stores such as Elasticsearch, MongoDB, and FAISS, on a ~10% subset of PubMed (2.4M documents) to measure indexing efficiency, retrieval latency, and retriever performance in the end-to-end RAG system. Based on these insights, we deploy the final RAG system on the full 24M PubMed corpus, comparing different retrievers' impact on overall performance. Evaluations of the retrieval depth show that retrieving 50 documents with BM25 before reranking with MedCPT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

slinusc/medical_RAG_system
noneOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Dropout · Layer Normalization · Byte Pair Encoding · Attention Dropout · Softmax · Residual Connection · WordPiece