Free Form Medical Visual Question Answering in Radiology
Abhishek Narayanan, Rushabh Musthyala, Rahul Sankar, Anirudh Prasad, Nistala, Pranav Singh, Jacopo Cirrone

TL;DR
This paper advances medical visual question answering in radiology by developing a more versatile model that effectively integrates multimodal data, surpasses existing methods, and enhances diagnostic potential.
Contribution
It introduces a novel approach to joint multimodal representation learning and augments the SLAKE dataset for broader question answering in radiology.
Findings
Achieved 79.55% top-1 accuracy with a less complex model
Enhanced dataset enables answering more diverse questions
Model performance is comparable to state-of-the-art methods
Abstract
Visual Question Answering (VQA) in the medical domain presents a unique, interdisciplinary challenge, combining fields such as Computer Vision, Natural Language Processing, and Knowledge Representation. Despite its importance, research in medical VQA has been scant, only gaining momentum since 2018. Addressing this gap, our research delves into the effective representation of radiology images and the joint learning of multimodal representations, surpassing existing methods. We innovatively augment the SLAKE dataset, enabling our model to respond to a more diverse array of questions, not limited to the immediate content of radiology or pathology images. Our model achieves a top-1 accuracy of 79.55\% with a less complex architecture, demonstrating comparable performance to current state-of-the-art models. This research not only advances medical VQA but also opens avenues for practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
