Simplifying DINO via Coding Rate Regularization

Ziyang Wu; Jingyuan Zhang; Druv Pai; XuDong Wang; Chandan Singh,; Jianwei Yang; Jianfeng Gao; Yi Ma

arXiv:2502.10385·cs.CV·February 17, 2025

Simplifying DINO via Coding Rate Regularization

Ziyang Wu, Jingyuan Zhang, Druv Pai, XuDong Wang, Chandan Singh,, Jianwei Yang, Jianfeng Gao, Yi Ma

PDF

Open Access 1 Models

TL;DR

This paper introduces simplified versions of DINO and DINOv2 models, called SimDINO and SimDINOv2, that use coding rate regularization to improve robustness, stability, and downstream task performance without complex training procedures.

Contribution

The authors propose a minimalistic approach by adding a coding rate regularization term, removing complex heuristics, and achieving more robust and higher-performing self-supervised vision models.

Findings

01

SimDINO and SimDINOv2 outperform original models on downstream tasks.

02

Simplified models are more robust to hyperparameter and architecture choices.

03

The approach reduces training complexity and improves stability.

Abstract

DINO and DINOv2 are two model families being widely used to learn representations from unlabeled imagery data at large scales. Their learned representations often enable state-of-the-art performance for downstream tasks, such as image classification and segmentation. However, they employ many empirically motivated design choices and their training pipelines are highly complex and unstable -- many hyperparameters need to be carefully tuned to ensure that the representations do not collapse -- which poses considerable difficulty to improving them or adapting them to new domains. In this work, we posit that we can remove most such-motivated idiosyncrasies in the pre-training pipelines, and only need to add an explicit coding rate term in the loss function to avoid collapse of the representations. As a result, we obtain highly simplified variants of the DINO and DINOv2 which we call SimDINO…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
nielsr/simdino-base-16
model· 3 dl
3 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques

MethodsSoftmax · Dense Connections · Linear Layer · Residual Connection · Layer Normalization · Attention Is All You Need · Multi-Head Attention · Vision Transformer · self-DIstillation with NO labels