Leveraging Registers in Vision Transformers for Robust Adaptation

Srikar Yellapragada; Kowshik Thopalli; Vivek Narayanaswamy; Wesam; Sakla; Yang Liu; Yamen Mubarka; Dimitris Samaras; Jayaraman J. Thiagarajan

arXiv:2501.04784·cs.CV·January 10, 2025

Leveraging Registers in Vision Transformers for Robust Adaptation

Srikar Yellapragada, Kowshik Thopalli, Vivek Narayanaswamy, Wesam, Sakla, Yang Liu, Yamen Mubarka, Dimitris Samaras, Jayaraman J. Thiagarajan

PDF

Open Access

TL;DR

This paper introduces a method using register tokens in Vision Transformers to improve out-of-distribution generalization and anomaly detection without extra computational cost.

Contribution

It proposes a simple technique combining CLS and register embeddings to enhance ViT robustness in OOD scenarios, a relatively unexplored area.

Findings

01

2-4% improvement in OOD accuracy

02

2-3% reduction in false positive rates

03

Maintains in-distribution performance

Abstract

Vision Transformers (ViTs) have shown success across a variety of tasks due to their ability to capture global image representations. Recent studies have identified the existence of high-norm tokens in ViTs, which can interfere with unsupervised object discovery. To address this, the use of "registers" which are additional tokens that isolate high norm patch tokens while capturing global image-level information has been proposed. While registers have been studied extensively for object discovery, their generalization properties particularly in out-of-distribution (OOD) scenarios, remains underexplored. In this paper, we examine the utility of register token embeddings in providing additional features for improving generalization and anomaly rejection. To that end, we propose a simple method that combines the special CLS token embedding commonly employed in ViTs with the average-pooled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · 3D Surveying and Cultural Heritage · Industrial Vision Systems and Defect Detection