Multi-modal Masked Siamese Network Improves Chest X-Ray Representation Learning
Saeed Shurrab, Alejandro Guerra-Manzanares, Farah E. Shamout

TL;DR
This paper introduces a novel self-supervised learning approach that incorporates Electronic Health Records (EHR) data into a Masked Siamese Network to improve chest X-ray image representations, outperforming existing methods.
Contribution
It is the first to integrate EHR data into self-supervised pretraining for chest X-ray analysis using a Masked Siamese Network, enhancing representation quality.
Findings
Significant improvement over vanilla MSN and baselines in linear evaluation.
Effective use of demographic, scan metadata, and inpatient data.
Validated on three public chest X-ray datasets with two ViT backbones.
Abstract
Self-supervised learning methods for medical images primarily rely on the imaging modality during pretraining. While such approaches deliver promising results, they do not leverage associated patient or scan information collected within Electronic Health Records (EHR). Here, we propose to incorporate EHR data during self-supervised pretraining with a Masked Siamese Network (MSN) to enhance the quality of chest X-ray representations. We investigate three types of EHR data, including demographic, scan metadata, and inpatient stay information. We evaluate our approach on three publicly available chest X-ray datasets, MIMIC-CXR, CheXpert, and NIH-14, using two vision transformer (ViT) backbones, specifically ViT-Tiny and ViT-Small. In assessing the quality of the representations via linear evaluation, our proposed method demonstrates significant improvement compared to vanilla MSN and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Radiomics and Machine Learning in Medical Imaging · Lung Cancer Diagnosis and Treatment
MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Dense Connections · Siamese Network · Residual Connection · Multi-Head Attention · Vision Transformer
