Do All Vision Transformers Need Registers? A Cross-Architectural Reassessment

Spiros Baxevanakis; Platon Karageorgis; Ioannis Dravilas; Konrad Szewczyk

arXiv:2603.25803·cs.CV·March 30, 2026

Do All Vision Transformers Need Registers? A Cross-Architectural Reassessment

Spiros Baxevanakis, Platon Karageorgis, Ioannis Dravilas, Konrad Szewczyk

PDF

TL;DR

This paper reevaluates the necessity of registers in Vision Transformers, confirming some findings while highlighting limitations across different models and sizes, and clarifying terminology inconsistencies.

Contribution

It reproduces and extends prior work on registers in ViTs, analyzing their effects across multiple models, sizes, and clarifying terminology for broader applicability.

Findings

01

Registers improve attention map clarity in some models

02

Not all claims about registers generalize across models

03

Model size influences the impact of registers

Abstract

Training Vision Transformers (ViTs) presents significant challenges, one of which is the emergence of artifacts in attention maps, hindering their interpretability. Darcet et al. (2024) investigated this phenomenon and attributed it to the need of ViTs to store global information beyond the [CLS] token. They proposed a novel solution involving the addition of empty input tokens, named registers, which successfully eliminate artifacts and improve the clarity of attention maps. In this work, we reproduce the findings of Darcet et al. (2024) and evaluate the generalizability of their claims across multiple models, including DINO, DINOv2, OpenCLIP, and DeiT3. While we confirm the validity of several of their key claims, our results reveal that some claims do not extend universally to other models. Additionally, we explore the impact of model size, extending their findings to smaller models.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.