LVLM-Aware Multimodal Retrieval for RAG-Based Medical Diagnosis with General-Purpose Models

Nir Mazor; Tom Hope

arXiv:2508.17394·cs.CV·April 21, 2026

LVLM-Aware Multimodal Retrieval for RAG-Based Medical Diagnosis with General-Purpose Models

Nir Mazor, Tom Hope

PDF

1 Repo

TL;DR

This paper introduces a lightweight multimodal retrieval system for medical diagnosis that improves clinical classification and VQA tasks using general-purpose models, with analysis of retrieval errors.

Contribution

It presents a novel lightweight LVLM-aware multimodal retriever trained with minimal data, enhancing retrieval-augmented diagnosis without extensive medical pre-training.

Findings

01

Retrieval optimization improves inconsistent retrieval cases.

02

Lightweight fine-tuning achieves competitive results with less data.

03

Analysis reveals challenges in LVLMs utilizing retrieved info for predictions.

Abstract

Retrieving visual and textual information from medical literature and hospital records can enhance diagnostic accuracy for clinical image interpretation. However, multimodal retrieval-augmented diagnosis is highly challenging. We explore a lightweight mechanism for enhancing diagnostic performance of retrieval-augmented LVLMs. We train a lightweight LVLM-aware multimodal retriever, such that the retriever learns to return images and texts that guide the LVLM toward correct predictions. In our low-resource setting, we perform only lightweight fine-tuning with small amounts of data, and use only general-purpose backbone models, achieving competitive results in clinical classification and VQA tasks compared to medically pre-trained models with extensive training. In a novel analysis, we highlight a previously unexplored class of errors that we term inconsistent retrieval predictions: cases…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Nirmaz/CLARE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.