Efficient and Reproducible Biomedical Question Answering using Retrieval Augmented Generation
Linus Stuhlmann, Michael Alexander Saxer, Jonathan F\"urst

TL;DR
This paper evaluates retrieval-augmented generation methods for biomedical question answering, optimizing retrieval strategies and response times on large PubMed datasets to improve accuracy, efficiency, and scalability.
Contribution
It systematically compares retrieval strategies and demonstrates an optimal balance between accuracy and response time in biomedical QA using RAG systems.
Findings
Retrieving 50 documents with BM25 and reranking with MedCPT balances accuracy and speed.
BM25 retrieval remains fast at 82ms, while MedCPT adds computational cost.
The study provides insights into retrieval depth, efficiency, and scalability trade-offs.
Abstract
Biomedical question-answering (QA) systems require effective retrieval and generation components to ensure accuracy, efficiency, and scalability. This study systematically examines a Retrieval-Augmented Generation (RAG) system for biomedical QA, evaluating retrieval strategies and response time trade-offs. We first assess state-of-the-art retrieval methods, including BM25, BioBERT, MedCPT, and a hybrid approach, alongside common data stores such as Elasticsearch, MongoDB, and FAISS, on a ~10% subset of PubMed (2.4M documents) to measure indexing efficiency, retrieval latency, and retriever performance in the end-to-end RAG system. Based on these insights, we deploy the final RAG system on the full 24M PubMed corpus, comparing different retrievers' impact on overall performance. Evaluations of the retrieval depth show that retrieving 50 documents with BM25 before reranking with MedCPT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Dropout · Layer Normalization · Byte Pair Encoding · Attention Dropout · Softmax · Residual Connection · WordPiece
