PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

Xiaoman Zhang; Chaoyi Wu; Ziheng Zhao; Weixiong Lin; Ya Zhang; Yanfeng; Wang; Weidi Xie

arXiv:2305.10415·cs.CV·September 10, 2024·58 cites

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Weixiong Lin, Ya Zhang, Yanfeng, Wang, Weidi Xie

PDF

Open Access 2 Repos

TL;DR

This paper introduces PMC-VQA, a large-scale medical visual question answering dataset and a generative model that improves accuracy in medical image interpretation, advancing AI capabilities in healthcare diagnostics.

Contribution

The paper presents a scalable pipeline for creating a large medical VQA dataset and a generative model that outperforms existing methods on multiple benchmarks.

Findings

01

Significantly outperforms existing MedVQA models

02

Provides a large-scale, diverse medical VQA dataset

03

Establishes a new benchmark with manual verification

Abstract

Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery by leveraging artificial intelligence to interpret and answer questions based on medical images. In this study, we reframe the problem of MedVQA as a generation task that naturally follows the human-machine interaction and propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model. We establish a scalable pipeline to construct a large-scale medical visual question-answering dataset, named PMC-VQA, which contains 227k VQA pairs of 149k images that cover various modalities or diseases. We train the proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD, SLAKE, and Image-Clef-2019, significantly outperforming existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsTest