Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering
Danfeng Guo, Demetri Terzopoulos

TL;DR
This paper introduces two prompting strategies for Medical Large Vision-Language Models to reduce hallucinations and enhance diagnosis accuracy in visual question answering tasks, demonstrating significant improvements on medical datasets.
Contribution
It proposes novel prompting techniques, including detailed explanations and weak learner judgments, to improve MLVLM performance and reduce hallucinations in medical diagnosis tasks.
Findings
Significant increase in diagnostic F1 score, up to 0.27.
Improved recall by approximately 0.07.
Effective extension of strategies to general LVLM domains.
Abstract
Large Vision-Language Models (LVLMs) have achieved significant success in recent years, and they have been extended to the medical domain. Although demonstrating satisfactory performance on medical Visual Question Answering (VQA) tasks, Medical LVLMs (MLVLMs) suffer from the hallucination problem, which makes them fail to diagnose complex pathologies. Moreover, they readily fail to learn minority pathologies due to imbalanced training data. We propose two prompting strategies for MLVLMs that reduce hallucination and improve VQA performance. In the first strategy, we provide a detailed explanation of the queried pathology. In the second strategy, we fine-tune a cheap, weak learner to achieve high performance on a specific metric, and textually provide its judgment to the MLVLM. Tested on the MIMIC-CXR-JPG and Chexpert datasets, our methods significantly improve the diagnostic F1 score,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
