MedThink: Explaining Medical Visual Question Answering via Multimodal   Decision-Making Rationale

Xiaotang Gai; Chenyi Zhou; Jiaxiang Liu; Yang Feng; Jian Wu; Zuozhu; Liu

arXiv:2404.12372·cs.CV·October 8, 2024·2 cites

MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale

Xiaotang Gai, Chenyi Zhou, Jiaxiang Liu, Yang Feng, Jian Wu, Zuozhu, Liu

PDF

Open Access

TL;DR

MedThink introduces a novel framework for medical visual question answering that incorporates decision-making rationales, improving interpretability and accuracy on new benchmark datasets in healthcare imaging.

Contribution

The paper presents a semi-automated annotation process, new benchmark datasets with rationales, and a lightweight finetuned model that enhances interpretability and performance in MedVQA.

Findings

01

Achieved over 83% accuracy on benchmark datasets.

02

Outperformed existing state-of-the-art models with similar parameters.

03

Provided datasets and code for further research.

Abstract

Medical Visual Question Answering (MedVQA), which offers language responses to image-based medical inquiries, represents a challenging task and significant advancement in healthcare. It assists medical experts to swiftly interpret medical images, thereby enabling faster and more accurate diagnoses. However, the model interpretability and transparency of existing MedVQA solutions are often limited, posing challenges in understanding their decision-making processes. To address this issue, we devise a semi-automated annotation process to streamline data preparation and build new benchmark MedVQA datasets R-RAD, R-SLAKE and R-Path. These datasets provide intermediate medical decision-making rationales generated by multimodal large language models and human annotations for question-answering pairs in existing MedVQA datasets, i.e., VQA-RAD, SLAKE and PathVQA. Moreover, we design a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques