Randomized-MLP Regularization Improves Domain Adaptation and Interpretability in DINOv2
Joel Valdivia Ortega, Lorenz Lamm, Franziska Eckardt, Benedikt Schworm, Marion Jasnin, Tingying Peng

TL;DR
This paper introduces Randomized-MLP regularization for DINOv2, which enhances interpretability and domain adaptation in vision transformers, especially in medical imaging, without sacrificing performance.
Contribution
It proposes a contrastive learning-based RMLP regularization method for fine-tuning ViTs, improving interpretability and domain robustness in vision models.
Findings
RMLP improves interpretability of attention maps.
RMLP maintains or enhances downstream performance.
Mathematical analysis provides insights into RMLP's role.
Abstract
Vision Transformers (ViTs), such as DINOv2, achieve strong performance across domains but often repurpose low-informative patch tokens in ways that reduce the interpretability of attention and feature maps. This challenge is especially evident in medical imaging, where domain shifts can degrade both performance and transparency. In this paper, we introduce Randomized-MLP (RMLP) regularization, a contrastive learning-based method that encourages more semantically aligned representations. We use RMLPs when fine-tuning DINOv2 to both medical and natural image modalities, showing that it improves or maintains downstream performance while producing more interpretable attention maps. We also provide a mathematical analysis of RMLPs, offering insights into its role in enhancing ViT-based models and advancing our understanding of contrastive learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Visual Attention and Saliency Detection · Generative Adversarial Networks and Image Synthesis
