BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis
Masoud Monajatipoor, Mozhdeh Rouhsedaghat, Liunian Harold Li, Aichi, Chien, C.-C. Jay Kuo, Fabien Scalzo, and Kai-Wei Chang

TL;DR
BERTHop is a transformer-based vision-and-language model tailored for chest X-ray disease diagnosis, effectively capturing modality associations and outperforming state-of-the-art models on a standard benchmark.
Contribution
The paper introduces BERTHop, a novel model combining PixelHop++ and VisualBERT, specifically designed to improve medical V&L tasks by addressing domain-specific challenges.
Findings
BERTHop achieves 98.12% AUC on OpenI dataset.
It outperforms SOTA by 1.62% AUC.
Requires 9 times less training data.
Abstract
Vision-and-language(V&L) models take image and text as input and learn to capture the associations between them. Prior studies show that pre-trained V&L models can significantly improve the model performance for downstream tasks such as Visual Question Answering (VQA). However, V&L models are less effective when applied in the medical domain (e.g., on X-ray images and clinical notes) due to the domain gap. In this paper, we investigate the challenges of applying pre-trained V&L models in medical applications. In particular, we identify that the visual representation in general V&L models is not suitable for processing medical data. To overcome this limitation, we propose BERTHop, a transformer-based model based on PixelHop++ and VisualBERT, for better capturing the associations between the two modalities. Experiments on the OpenI dataset, a commonly used thoracic disease diagnosis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Multimodal Machine Learning Applications · Topic Modeling
MethodsVisualBERT
