BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease   Diagnosis

Masoud Monajatipoor; Mozhdeh Rouhsedaghat; Liunian Harold Li; Aichi; Chien; C.-C. Jay Kuo; Fabien Scalzo; and Kai-Wei Chang

arXiv:2108.04938·cs.CV·August 12, 2021·1 cites

BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis

Masoud Monajatipoor, Mozhdeh Rouhsedaghat, Liunian Harold Li, Aichi, Chien, C.-C. Jay Kuo, Fabien Scalzo, and Kai-Wei Chang

PDF

Open Access 1 Repo

TL;DR

BERTHop is a transformer-based vision-and-language model tailored for chest X-ray disease diagnosis, effectively capturing modality associations and outperforming state-of-the-art models on a standard benchmark.

Contribution

The paper introduces BERTHop, a novel model combining PixelHop++ and VisualBERT, specifically designed to improve medical V&L tasks by addressing domain-specific challenges.

Findings

01

BERTHop achieves 98.12% AUC on OpenI dataset.

02

It outperforms SOTA by 1.62% AUC.

03

Requires 9 times less training data.

Abstract

Vision-and-language(V&L) models take image and text as input and learn to capture the associations between them. Prior studies show that pre-trained V&L models can significantly improve the model performance for downstream tasks such as Visual Question Answering (VQA). However, V&L models are less effective when applied in the medical domain (e.g., on X-ray images and clinical notes) due to the domain gap. In this paper, we investigate the challenges of applying pre-trained V&L models in medical applications. In particular, we identify that the visual representation in general V&L models is not suitable for processing medical data. To overcome this limitation, we propose BERTHop, a transformer-based model based on PixelHop++ and VisualBERT, for better capturing the associations between the two modalities. Experiments on the OpenI dataset, a commonly used thoracic disease diagnosis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

masoud-monajati/BERTHop
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Multimodal Machine Learning Applications · Topic Modeling

MethodsVisualBERT