RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
Peng Xia, Kangyu Zhu, Haoran Li, Hongtu Zhu, Yun Li and, Gang Li, Linjun Zhang, Huaxiu Yao

TL;DR
RULE introduces a method to improve factual accuracy in medical vision-language models by calibrating retrieval and fine-tuning to balance reliance on internal knowledge and external data, significantly enhancing performance.
Contribution
The paper presents a novel approach with calibrated retrieval and fine-tuning strategies to address factuality issues in Medical Large Vision Language Models.
Findings
Achieved an average of 47.4% improvement in factual accuracy across tasks.
Demonstrated effectiveness on medical VQA and report generation datasets.
Publicly released benchmark and code for reproducibility.
Abstract
The recent emergence of Medical Large Vision Language Models (Med-LVLMs) has enhanced medical diagnosis. However, current Med-LVLMs frequently encounter factual issues, often generating responses that do not align with established medical facts. Retrieval-Augmented Generation (RAG), which utilizes external knowledge, can improve the factual accuracy of these models but introduces two major challenges. First, limited retrieved contexts might not cover all necessary information, while excessive retrieval can introduce irrelevant and inaccurate references, interfering with the model's generation. Second, in cases where the model originally responds correctly, applying RAG can lead to an over-reliance on retrieved contexts, resulting in incorrect answers. To address these issues, we propose RULE, which consists of two components. First, we introduce a provably effective strategy for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Multi-Head Attention · Softmax · WordPiece · Residual Connection · Byte Pair Encoding · Layer Normalization · Attention Dropout
