TL;DR
This paper introduces AVA-DINO, a novel anomaly-aware vision-language framework with dual branches for normal and anomalous patterns, achieving state-of-the-art zero-shot anomaly detection across various benchmarks.
Contribution
It proposes a dual-branch, anomaly-aware adaptation framework that dynamically combines features for improved zero-shot anomaly detection without domain-specific fine-tuning.
Findings
Achieves 93.5% image-AUROC on MVTec-AD.
Demonstrates strong cross-domain generalization to medical imaging.
Outperforms existing methods on nine industrial and medical benchmarks.
Abstract
Zero-shot anomaly detection aims to identify defects in unseen categories without target-specific training. Existing methods usually apply the same feature transformation to all samples, treating normal and anomalous data uniformly despite their fundamentally asymmetric distributions, compact normals versus diverse anomalies. We instead exploit this natural asymmetry by proposing AVA-DINO, an anomaly-aware vision-language adaptation framework with dual specialized branches for normal and anomalous patterns that adapt frozen DINOv3 visual features. During training on auxiliary data, the two branches are learned jointly with a text-guided routing mechanism and explicit routing regularization that encourages branch specialization. At test time, only the input image and fixed, predefined language descriptions are used to dynamically combine the two branches, enabling an asymmetric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
