Prompting Medical Large Vision-Language Models to Diagnose Pathologies   by Visual Question Answering

Danfeng Guo; Demetri Terzopoulos

arXiv:2407.21368·cs.CV·March 18, 2025

Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering

Danfeng Guo, Demetri Terzopoulos

PDF

Open Access

TL;DR

This paper introduces two prompting strategies for Medical Large Vision-Language Models to reduce hallucinations and enhance diagnosis accuracy in visual question answering tasks, demonstrating significant improvements on medical datasets.

Contribution

It proposes novel prompting techniques, including detailed explanations and weak learner judgments, to improve MLVLM performance and reduce hallucinations in medical diagnosis tasks.

Findings

01

Significant increase in diagnostic F1 score, up to 0.27.

02

Improved recall by approximately 0.07.

03

Effective extension of strategies to general LVLM domains.

Abstract

Large Vision-Language Models (LVLMs) have achieved significant success in recent years, and they have been extended to the medical domain. Although demonstrating satisfactory performance on medical Visual Question Answering (VQA) tasks, Medical LVLMs (MLVLMs) suffer from the hallucination problem, which makes them fail to diagnose complex pathologies. Moreover, they readily fail to learn minority pathologies due to imbalanced training data. We propose two prompting strategies for MLVLMs that reduce hallucination and improve VQA performance. In the first strategy, we provide a detailed explanation of the queried pathology. In the second strategy, we fine-tune a cheap, weak learner to achieve high performance on a specific metric, and textually provide its judgment to the MLVLM. Tested on the MIMIC-CXR-JPG and Chexpert datasets, our methods significantly improve the diagnostic F1 score,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications