Self-distilled Masked Attention guided masked image modeling with noise Regularized Teacher (SMART) for medical image analysis
Jue Jiang, Aneesh Rangnekar, Chloe Min Seo Choi, Harini Veeraraghavan

TL;DR
This paper introduces SMARTFormer, a novel masked image modeling approach for medical images that uses semantic attention and a noisy teacher to improve transformer pretraining and downstream task accuracy, especially with limited data.
Contribution
The paper develops SMARTFormer, a hierarchical Swin transformer with semantic attention and a noisy teacher, enabling effective masked image modeling for 3D medical images.
Findings
Achieved 89.5% accuracy in classifying lung nodules.
Predicted lung cancer treatment response with 74% accuracy.
Improved unsupervised segmentation of organs and tumors.
Abstract
Pretraining vision transformers (ViT) with attention guided masked image modeling (MIM) has shown to increase downstream accuracy for natural image analysis. Hierarchical shifted window (Swin) transformer, often used in medical image analysis cannot use attention guided masking as it lacks an explicit [CLS] token, needed for computing attention maps for selective masking. We thus enhanced Swin with semantic class attention. We developed a co-distilled Swin transformer that combines a noisy momentum updated teacher to guide selective masking for MIM. Our approach called \textsc{s}e\textsc{m}antic \textsc{a}ttention guided co-distillation with noisy teacher \textsc{r}egularized Swin \textsc{T}rans\textsc{F}ormer (SMARTFormer) was applied for analyzing 3D computed tomography datasets with lung nodules and malignant lung cancers (LC). We also analyzed the impact of semantic attention and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCell Image Analysis Techniques · Radiomics and Machine Learning in Medical Imaging · Image Processing Techniques and Applications
MethodsAttention Is All You Need · Softmax · Linear Layer · Dense Connections · Stochastic Depth · Residual Connection · Multi-Head Attention · Layer Normalization · Swin Transformer · Mutual Information Machine/Mask Image Modeling
