Zero-Ablation Overstates Register Content Dependence in DINO Vision Transformers
Felipe Parodi, Jordan Matelsky, Melanie Segado

TL;DR
Zero-ablation overstates the importance of exact register content in vision transformers; alternative replacements show performance is maintained by plausible register-like activations.
Contribution
The paper demonstrates that zero-ablation overstates register dependence in vision transformers and introduces replacement controls that better reflect true functional importance.
Findings
Replacement controls preserve performance despite perturbations.
Zero-ablation causes disproportionately large internal representation perturbations.
Register-like activations, not exact values, underpin performance in frozen-feature evaluations.
Abstract
Zero-ablation -- replacing token activations with zero vectors -- is widely used to probe token function in vision transformers. Register zeroing in DINOv2+registers and DINOv3 produces large drops (up to \,pp classification, \,pp segmentation), suggesting registers are functionally indispensable. However, three replacement controls -- mean-substitution, noise-substitution, and cross-image register-shuffling -- preserve performance across classification, correspondence, and segmentation, remaining within \,pp of the unmodified baseline. Per-patch cosine similarity shows these replacements genuinely perturb internal representations, while zeroing causes disproportionately large perturbations, consistent with why it alone degrades tasks. We conclude that zero-ablation overstates dependence on exact register content. In the frozen-feature evaluations we test,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
