Delving into Masked Autoencoders for Multi-Label Thorax Disease   Classification

Junfei Xiao; Yutong Bai; Alan Yuille; Zongwei Zhou

arXiv:2210.12843·cs.CV·October 25, 2022·6 cites

Delving into Masked Autoencoders for Multi-Label Thorax Disease Classification

Junfei Xiao, Yutong Bai, Alan Yuille, Zongwei Zhou

PDF

Open Access 1 Repo

TL;DR

This study demonstrates that Vision Transformers pre-trained with Masked Autoencoders on chest X-rays can achieve performance comparable to state-of-the-art CNNs in multi-label thorax disease classification, with tailored pre-training and fine-tuning strategies.

Contribution

It introduces a novel pre-training approach for ViTs using Masked Autoencoders on medical images and provides empirical insights into effective training recipes for medical imaging tasks.

Findings

01

ViT with MAE pre-training performs comparably to DenseNet-121.

02

Medical image reconstruction requires smaller image portions and moderate crop ranges.

03

In-domain transfer learning and specific fine-tuning strategies improve performance.

Abstract

Vision Transformer (ViT) has become one of the most popular neural architectures due to its great scalability, computational efficiency, and compelling performance in many vision tasks. However, ViT has shown inferior performance to Convolutional Neural Network (CNN) on medical tasks due to its data-hungry nature and the lack of annotated medical data. In this paper, we pre-train ViTs on 266,340 chest X-rays using Masked Autoencoders (MAE) which reconstruct missing pixels from a small part of each image. For comparison, CNNs are also pre-trained on the same 266,340 X-rays using advanced self-supervised methods (e.g., MoCo v2). The results show that our pre-trained ViT performs comparably (sometimes better) to the state-of-the-art CNN (DenseNet-121) for multi-label thorax disease classification. This performance is attributed to the strong recipes extracted from our empirical studies for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lambert-x/medical_mae
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Radiomics and Machine Learning in Medical Imaging · AI in cancer detection

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Batch Normalization · InfoNCE · Adam · Label Smoothing