Co-distilled attention guided masked image modeling with noisy teacher for self-supervised learning on medical images

Jue Jiang; Aneesh Rangnekar; Harini Veeraraghavan

arXiv:2604.14506·cs.CV·April 17, 2026

Co-distilled attention guided masked image modeling with noisy teacher for self-supervised learning on medical images

Jue Jiang, Aneesh Rangnekar, Harini Veeraraghavan

PDF

TL;DR

This paper introduces DAGMaN, a novel self-supervised learning method for medical images that uses attention-guided masking with a noisy teacher to improve feature learning and downstream task performance.

Contribution

It proposes a co-distillation framework with attention-guided masking and a noisy teacher to enhance SSL for medical images, addressing information leakage and attention diversity issues.

Findings

01

Effective in lung nodule classification and immunotherapy outcome prediction

02

Improves tumor segmentation and organs clustering performance

03

Addresses attention diversity loss with noisy teacher integration

Abstract

Masked image modeling (MIM) is a highly effective self-supervised learning (SSL) approach to extract useful feature representations from unannotated data. Predominantly used random masking methods make SSL less effective for medical images due to the contextual similarity of neighboring patches, leading to information leakage and SSL simplification. Hierarchical shifted window (Swin) transformer, a highly effective approach for medical images cannot use advanced masking methods as it lacks a global [CLS] token. Hence, we introduced an attention guided masking mechanism for Swin within a co-distillation learning framework to selectively mask semantically co-occurring and discriminative patches, to reduce information leakage and increase the difficulty of SSL pretraining. However, attention guided masking inevitably reduces the diversity of attention heads, which negatively impacts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.