CoMAD: A Multiple-Teacher Self-Supervised Distillation Framework

Sriram Mandalika; Lalitha V

arXiv:2508.04816·cs.CV·August 8, 2025

CoMAD: A Multiple-Teacher Self-Supervised Distillation Framework

Sriram Mandalika, Lalitha V

PDF

TL;DR

CoMAD is a novel self-supervised distillation framework that unifies multiple vision transformer teachers into a compact student, improving performance on image classification and dense prediction tasks.

Contribution

Introduces a parameter-free, multi-teacher distillation method with asymmetric masking and consensus gating for efficient self-supervised learning.

Findings

01

Achieves 75.4% Top-1 accuracy on ImageNet-1K with ViT-Tiny.

02

Sets new state-of-the-art in dense prediction tasks for compact SSL models.

03

Improves previous methods by integrating multiple teacher priors effectively.

Abstract

Numerous self-supervised learning paradigms, such as contrastive learning and masked image modeling, learn powerful representations from unlabeled data but are typically pretrained in isolation, overlooking complementary insights and yielding large models that are impractical for resource-constrained deployment. To overcome these challenges, we introduce Consensus-oriented Masked Distillation (CoMAD), a lightweight, parameter-free framework that unifies knowledge from multiple current state-of-the-art self-supervised Vision Transformers into a compact student network. CoMAD distills from three pretrained ViT-Base teachers, MAE, MoCo v3, and iBOT, each offering distinct semantic and contextual priors. Rather than naively averaging teacher outputs, we apply asymmetric masking: the student sees only 25 percent of patches while each teacher receives a progressively lighter, unique mask,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.