MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models
Peng Xia, Kangyu Zhu, Haoran Li, Tianze Wang, Weijia Shi, Sheng Wang,, Linjun Zhang, James Zou, Huaxiu Yao

TL;DR
This paper introduces MMed-RAG, a versatile multimodal retrieval-augmented generation system that significantly improves factual accuracy in medical vision-language models across various medical domains.
Contribution
The paper presents a novel domain-aware retrieval, adaptive context selection, and preference fine-tuning strategy to enhance the generality and reliability of RAG in medical applications.
Findings
Achieves 43.8% improvement in factual accuracy on medical VQA and report generation tasks.
Demonstrates effectiveness across five diverse medical datasets.
Enhances alignment and reduces hallucination in medical vision-language models.
Abstract
Artificial Intelligence (AI) has demonstrated significant potential in healthcare, particularly in disease diagnosis and treatment planning. Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools. However, these models often suffer from factual hallucination, which can lead to incorrect diagnoses. Fine-tuning and retrieval-augmented generation (RAG) have emerged as methods to address these issues. However, the amount of high-quality data and distribution shifts between training data and deployment data limit the application of fine-tuning methods. Although RAG is lightweight and effective, existing RAG-based approaches are not sufficiently general to different medical domains and can potentially cause misalignment issues, both between modalities and between the model and the ground truth. In this paper, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Biomedical Text Mining and Ontologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Byte Pair Encoding · Softmax · Multi-Head Attention · WordPiece · Dropout · Layer Normalization · Adam · Attention Dropout
