Benchmarking Retrieval-Augmented Generation for Medicine
Guangzhi Xiong, Qiao Jin, Zhiyong Lu, Aidong Zhang

TL;DR
This paper introduces MIRAGE, a comprehensive benchmark for evaluating retrieval-augmented generation systems in medicine, demonstrating significant performance improvements and revealing key scaling properties and effects.
Contribution
It presents MIRAGE, the first extensive benchmark for medical RAG systems, and provides large-scale experimental insights and best practices for medical question answering.
Findings
MedRAG improves accuracy of LLMs by up to 18%.
Combining various medical corpora and retrievers yields best performance.
Identifies log-linear scaling and 'lost-in-the-middle' effects in medical RAG.
Abstract
While large language models (LLMs) have achieved state-of-the-art performance on a wide range of medical question answering (QA) tasks, they still face challenges with hallucinations and outdated knowledge. Retrieval-augmented generation (RAG) is a promising solution and has been widely adopted. However, a RAG system can involve multiple flexible components, and there is a lack of best practices regarding the optimal RAG setting for various medical purposes. To systematically evaluate such systems, we propose the Medical Information Retrieval-Augmented Generation Evaluation (MIRAGE), a first-of-its-kind benchmark including 7,663 questions from five medical QA datasets. Using MIRAGE, we conducted large-scale experiments with over 1.8 trillion prompt tokens on 41 combinations of different corpora, retrievers, and backbone LLMs through the MedRAG toolkit introduced in this work. Overall,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Biomedical Text Mining and Ontologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Linear Warmup With Linear Decay · Linear Layer · WordPiece · Byte Pair Encoding · Attention Dropout · Dense Connections · Cosine Annealing · {Dispute@FaQ-s}How to file a dispute with Expedia?
