Delving into Masked Autoencoders for Multi-Label Thorax Disease Classification
Junfei Xiao, Yutong Bai, Alan Yuille, Zongwei Zhou

TL;DR
This study demonstrates that Vision Transformers pre-trained with Masked Autoencoders on chest X-rays can achieve performance comparable to state-of-the-art CNNs in multi-label thorax disease classification, with tailored pre-training and fine-tuning strategies.
Contribution
It introduces a novel pre-training approach for ViTs using Masked Autoencoders on medical images and provides empirical insights into effective training recipes for medical imaging tasks.
Findings
ViT with MAE pre-training performs comparably to DenseNet-121.
Medical image reconstruction requires smaller image portions and moderate crop ranges.
In-domain transfer learning and specific fine-tuning strategies improve performance.
Abstract
Vision Transformer (ViT) has become one of the most popular neural architectures due to its great scalability, computational efficiency, and compelling performance in many vision tasks. However, ViT has shown inferior performance to Convolutional Neural Network (CNN) on medical tasks due to its data-hungry nature and the lack of annotated medical data. In this paper, we pre-train ViTs on 266,340 chest X-rays using Masked Autoencoders (MAE) which reconstruct missing pixels from a small part of each image. For comparison, CNNs are also pre-trained on the same 266,340 X-rays using advanced self-supervised methods (e.g., MoCo v2). The results show that our pre-trained ViT performs comparably (sometimes better) to the state-of-the-art CNN (DenseNet-121) for multi-label thorax disease classification. This performance is attributed to the strong recipes extracted from our empirical studies for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Radiomics and Machine Learning in Medical Imaging · AI in cancer detection
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Batch Normalization · InfoNCE · Adam · Label Smoothing
