SINDER: Repairing the Singular Defects of DINOv2

Haoqi Wang; Tong Zhang; Mathieu Salzmann

arXiv:2407.16826·cs.CV·July 25, 2024

SINDER: Repairing the Singular Defects of DINOv2

Haoqi Wang, Tong Zhang, Mathieu Salzmann

PDF

1 Repo 1 Models

TL;DR

This paper investigates artifacts in DINOv2 Vision Transformer models, identifies their origin in the leading singular vector of weights, and proposes a fine-tuning regularization method to repair these defects efficiently.

Contribution

The paper uncovers the root cause of patch token artifacts in Vision Transformers and introduces a novel regularization technique for effective model repair without full re-training.

Findings

01

Artifacts originate from the leading singular vector of weights.

02

Proposed regularization improves downstream task performance.

03

Method avoids full re-training, saving computational resources.

Abstract

Vision Transformer models trained on large-scale datasets, although effective, often exhibit artifacts in the patch token they extract. While such defects can be alleviated by re-training the entire model with additional classification tokens, the underlying reasons for the presence of these tokens remain unclear. In this paper, we conduct a thorough investigation of this phenomenon, combining theoretical analysis with empirical observations. Our findings reveal that these artifacts originate from the pre-trained network itself, specifically stemming from the leading left singular vector of the network's weights. Furthermore, to mitigate these defects, we propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset, thereby avoiding the need for complete re-training. We validate our method on various downstream tasks, including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haoqiwang/sinder
pytorchOfficial

Models

🤗
haoqiwang/sinder
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention · Dense Connections