Anomaly-Aware Vision-Language Adapters for Zero-Shot Anomaly Detection

Muhammad Aqeel; Maham Nazir; Uzair Khan; Marco Cristani; Francesco Setti

arXiv:2605.12069·cs.CV·May 13, 2026

Anomaly-Aware Vision-Language Adapters for Zero-Shot Anomaly Detection

Muhammad Aqeel, Maham Nazir, Uzair Khan, Marco Cristani, Francesco Setti

PDF

1 Repo

TL;DR

This paper introduces AVA-DINO, a novel anomaly-aware vision-language framework with dual branches for normal and anomalous patterns, achieving state-of-the-art zero-shot anomaly detection across various benchmarks.

Contribution

It proposes a dual-branch, anomaly-aware adaptation framework that dynamically combines features for improved zero-shot anomaly detection without domain-specific fine-tuning.

Findings

01

Achieves 93.5% image-AUROC on MVTec-AD.

02

Demonstrates strong cross-domain generalization to medical imaging.

03

Outperforms existing methods on nine industrial and medical benchmarks.

Abstract

Zero-shot anomaly detection aims to identify defects in unseen categories without target-specific training. Existing methods usually apply the same feature transformation to all samples, treating normal and anomalous data uniformly despite their fundamentally asymmetric distributions, compact normals versus diverse anomalies. We instead exploit this natural asymmetry by proposing AVA-DINO, an anomaly-aware vision-language adaptation framework with dual specialized branches for normal and anomalous patterns that adapt frozen DINOv3 visual features. During training on auxiliary data, the two branches are learned jointly with a text-guided routing mechanism and explicit routing regularization that encourages branch specialization. At test time, only the input image and fixed, predefined language descriptions are used to dynamically combine the two branches, enabling an asymmetric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aqeeelmirza/AVA-DINO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.