Free Form Medical Visual Question Answering in Radiology

Abhishek Narayanan; Rushabh Musthyala; Rahul Sankar; Anirudh Prasad; Nistala; Pranav Singh; Jacopo Cirrone

arXiv:2401.13081·cs.CV·January 25, 2024·1 cites

Free Form Medical Visual Question Answering in Radiology

Abhishek Narayanan, Rushabh Musthyala, Rahul Sankar, Anirudh Prasad, Nistala, Pranav Singh, Jacopo Cirrone

PDF

Open Access

TL;DR

This paper advances medical visual question answering in radiology by developing a more versatile model that effectively integrates multimodal data, surpasses existing methods, and enhances diagnostic potential.

Contribution

It introduces a novel approach to joint multimodal representation learning and augments the SLAKE dataset for broader question answering in radiology.

Findings

01

Achieved 79.55% top-1 accuracy with a less complex model

02

Enhanced dataset enables answering more diverse questions

03

Model performance is comparable to state-of-the-art methods

Abstract

Visual Question Answering (VQA) in the medical domain presents a unique, interdisciplinary challenge, combining fields such as Computer Vision, Natural Language Processing, and Knowledge Representation. Despite its importance, research in medical VQA has been scant, only gaining momentum since 2018. Addressing this gap, our research delves into the effective representation of radiology images and the joint learning of multimodal representations, surpassing existing methods. We innovatively augment the SLAKE dataset, enabling our model to respond to a more diverse array of questions, not limited to the immediate content of radiology or pathology images. Our model achieves a top-1 accuracy of 79.55\% with a less complex architecture, demonstrating comparable performance to current state-of-the-art models. This research not only advances medical VQA but also opens avenues for practical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning