MMBERT: Multimodal BERT Pretraining for Improved Medical VQA

Yash Khare; Viraj Bagal; Minesh Mathew; Adithi Devi; U Deva; Priyakumar; CV Jawahar

arXiv:2104.01394·cs.CV·April 6, 2021

MMBERT: Multimodal BERT Pretraining for Improved Medical VQA

Yash Khare, Viraj Bagal, Minesh Mathew, Adithi Devi, U Deva, Priyakumar, CV Jawahar

PDF

1 Repo

TL;DR

MMBERT introduces a self-supervised multimodal pretraining approach for medical VQA, leveraging image and text data to improve accuracy and interpretability in radiology question answering tasks.

Contribution

It presents a novel multimodal BERT pretraining method tailored for medical images, achieving state-of-the-art results on radiology VQA datasets.

Findings

01

Achieved new state-of-the-art performance on VQA-Med 2019 and VQA-RAD datasets.

02

Outperformed ensemble models of previous solutions.

03

Provided attention maps for model interpretability.

Abstract

Images in the medical domain are fundamentally different from the general domain images. Consequently, it is infeasible to directly employ general domain Visual Question Answering (VQA) models for the medical domain. Additionally, medical images annotation is a costly and time-consuming process. To overcome these limitations, we propose a solution inspired by self-supervised pretraining of Transformer-style architectures for NLP, Vision and Language tasks. Our method involves learning richer medical image and text semantic representations using Masked Language Modeling (MLM) with image features as the pretext task on a large medical image+caption dataset. The proposed solution achieves new state-of-the-art performance on two VQA datasets for radiology images -- VQA-Med 2019 and VQA-RAD, outperforming even the ensemble models of previous best solutions. Moreover, our solution provides…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

VirajBagal/MMBERT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.