RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language   Models

Peng Xia; Kangyu Zhu; Haoran Li; Hongtu Zhu; Yun Li and; Gang Li; Linjun Zhang; Huaxiu Yao

arXiv:2407.05131·cs.LG·October 18, 2024·1 cites

RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

Peng Xia, Kangyu Zhu, Haoran Li, Hongtu Zhu, Yun Li and, Gang Li, Linjun Zhang, Huaxiu Yao

PDF

Open Access 1 Repo 1 Video

TL;DR

RULE introduces a method to improve factual accuracy in medical vision-language models by calibrating retrieval and fine-tuning to balance reliance on internal knowledge and external data, significantly enhancing performance.

Contribution

The paper presents a novel approach with calibrated retrieval and fine-tuning strategies to address factuality issues in Medical Large Vision Language Models.

Findings

01

Achieved an average of 47.4% improvement in factual accuracy across tasks.

02

Demonstrated effectiveness on medical VQA and report generation datasets.

03

Publicly released benchmark and code for reproducibility.

Abstract

The recent emergence of Medical Large Vision Language Models (Med-LVLMs) has enhanced medical diagnosis. However, current Med-LVLMs frequently encounter factual issues, often generating responses that do not align with established medical facts. Retrieval-Augmented Generation (RAG), which utilizes external knowledge, can improve the factual accuracy of these models but introduces two major challenges. First, limited retrieved contexts might not cover all necessary information, while excessive retrieval can introduce irrelevant and inaccurate references, interfering with the model's generation. Second, in cases where the model originally responds correctly, applying RAG can lead to an over-reliance on retrieved contexts, resulting in incorrect answers. To address these issues, we propose RULE, which consists of two components. First, we introduce a provably effective strategy for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

richard-peng-xia/rule
pytorchOfficial

Videos

RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models· underline

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Multi-Head Attention · Softmax · WordPiece · Residual Connection · Byte Pair Encoding · Layer Normalization · Attention Dropout