DINO as a von Mises-Fisher mixture model
Hariprasath Govindarajan, Per Sid\'en, Jacob Roll, Fredrik Lindsten

TL;DR
This paper reinterprets DINO as a von Mises-Fisher mixture model, introduces DINO-vMF with normalization constants, and demonstrates improved stability and performance in self-supervised image representation learning.
Contribution
It provides a novel mixture model interpretation of DINO and proposes DINO-vMF, enhancing stability and downstream task performance.
Findings
DINO-vMF is stable for larger models with unnormalized prototypes.
DINO-vMF outperforms DINO on various downstream tasks.
The mixture model interpretation improves understanding of self-supervised methods.
Abstract
Self-distillation methods using Siamese networks are popular for self-supervised pre-training. DINO is one such method based on a cross-entropy loss between -dimensional probability vectors, obtained by applying a softmax function to the dot product between representations and learnt prototypes. Given the fact that the learned representations are -normalized, we show that DINO and its derivatives, such as iBOT, can be interpreted as a mixture model of von Mises-Fisher components. With this interpretation, DINO assumes equal precision for all components when the prototypes are also -normalized. Using this insight we propose DINO-vMF, that adds appropriate normalization constants when computing the cluster assignment probabilities. Unlike DINO, DINO-vMF is stable also for the larger ViT-Base model with unnormalized prototypes. We show that the added flexibility of the mixture…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsBayesian Methods and Mixture Models
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Dense Connections · Residual Connection · Vision Transformer · Softmax · self-DIstillation with NO labels
