Med-GRIM: Enhanced Zero-Shot Medical VQA using prompt-embedded Multimodal Graph RAG
Rakesh Raj Madavan, Akshat Kaimal, Hashim Faisal, Chandrakala S

TL;DR
Med-GRIM introduces a modular, knowledge-injected approach for medical visual question answering that enhances accuracy and efficiency without extensive fine-tuning, supported by a new dermatology dataset.
Contribution
It presents Med-GRIM, a novel zero-shot medical VQA model using graph-based retrieval and prompt engineering, and introduces DermaGraph, a dermatology dataset for multimodal research.
Findings
Med-GRIM achieves high accuracy with low computational cost.
The model effectively integrates domain knowledge via prompt-based retrieval.
DermaGraph enables scalable research in dermatological multimodal applications.
Abstract
An ensemble of trained multimodal encoders and vision-language models (VLMs) has become a standard approach for visual question answering (VQA) tasks. However, such models often fail to produce responses with the detailed precision necessary for complex, domain-specific applications such as medical VQA. Our representation model, BIND: BLIVA Integrated with Dense Encoding, extends prior multimodal work by refining the joint embedding space through dense, query-token-based encodings inspired by contrastive pretraining techniques. This refined encoder powers Med-GRIM, a model designed for medical VQA tasks that leverages graph-based retrieval and prompt engineering to integrate domain-specific knowledge. Rather than relying on compute-heavy fine-tuning of vision and language models on specific datasets, Med-GRIM applies a low-compute, modular workflow with small language models (SLMs) for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Graph Neural Networks · Topic Modeling
