SOWA: Adapting Hierarchical Frozen Window Self-Attention to   Visual-Language Models for Better Anomaly Detection

Zongxiang Hu; Zhaosheng Zhang

arXiv:2407.03634·cs.CV·November 20, 2024·1 cites

SOWA: Adapting Hierarchical Frozen Window Self-Attention to Visual-Language Models for Better Anomaly Detection

Zongxiang Hu, Zhaosheng Zhang

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces SOWA, a hierarchical window self-attention mechanism for vision-language models that improves anomaly detection accuracy in industrial settings by leveraging multi-level features and learnable prompts.

Contribution

The paper proposes a novel hierarchical window self-attention method based on CLIP, enhancing anomaly detection by better utilizing multi-level features and surpassing existing methods.

Findings

01

Achieved top performance on 18 out of 20 metrics across five datasets.

02

Outperformed state-of-the-art anomaly detection techniques.

03

Demonstrated robustness and scalability in industrial applications.

Abstract

Visual anomaly detection is essential in industrial manufacturing, yet traditional methods often rely heavily on extensive normal datasets and task-specific models, limiting their scalability. Recent advancements in large-scale vision-language models have significantly enhanced zero- and few-shot anomaly detection. However, these approaches may not fully leverage hierarchical features, potentially overlooking nuanced details crucial for accurate detection. To address this, we introduce a novel window self-attention mechanism based on the CLIP model, augmented with learnable prompts to process multi-level features within a Soldier-Officer Window Self-Attention (SOWA) framework. Our method has been rigorously evaluated on five benchmark datasets, achieving superior performance by leading in 18 out of 20 metrics, setting a new standard against existing state-of-the-art techniques.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huzongxiang/sowa
pytorchOfficial

Models

🤗
zongxiang/sowa
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Network Security and Intrusion Detection

MethodsContrastive Language-Image Pre-training