VisualAD: Language-Free Zero-Shot Anomaly Detection via Vision Transformer

Yanning Hou; Peiyuan Li; Zirui Liu; Yitong Wang; Yanran Ruan; Jianfeng Qiu; Ke Xu

arXiv:2603.07952·cs.CV·March 10, 2026

VisualAD: Language-Free Zero-Shot Anomaly Detection via Vision Transformer

Yanning Hou, Peiyuan Li, Zirui Liu, Yitong Wang, Yanran Ruan, Jianfeng Qiu, Ke Xu

PDF

Open Access

TL;DR

VisualAD introduces a purely visual, transformer-based approach for zero-shot anomaly detection that eliminates the need for text encoders, achieving state-of-the-art results across multiple benchmarks.

Contribution

It proposes a novel vision transformer framework with learnable tokens and spatial-aware modules for zero-shot anomaly detection, removing reliance on cross-modal text-image alignment.

Findings

01

Achieves state-of-the-art performance on 13 benchmarks

02

Works seamlessly with pretrained vision backbones like CLIP and DINOv2

03

Operates without text encoders, simplifying the ZSAD pipeline

Abstract

Zero-shot anomaly detection (ZSAD) requires detecting and localizing anomalies without access to target-class anomaly samples. Mainstream methods rely on vision-language models (VLMs) such as CLIP: they build hand-crafted or learned prompt sets for normal and abnormal semantics, then compute image-text similarities for open-set discrimination. While effective, this paradigm depends on a text encoder and cross-modal alignment, which can lead to training instability and parameter redundancy. This work revisits the necessity of the text branch in ZSAD and presents VisualAD, a purely visual framework built on Vision Transformers. We introduce two learnable tokens within a frozen backbone to directly encode normality and abnormality. Through multi-layer self-attention, these tokens interact with patch tokens, gradually acquiring high-level notions of normality and anomaly while guiding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning