Improving Vision Anomaly Detection with the Guidance of Language Modality
Dong Chen, Kaihang Pan, Guoming Wang, Yueting Zhuang, Siliang Tang

TL;DR
This paper introduces a multimodal approach combining vision and language modalities to improve anomaly detection by reducing redundant information and learning a more meaningful latent space, leading to significant performance gains.
Contribution
The paper proposes Cross-modal Guidance (CMG) with CMER and CMLE to enhance vision anomaly detection by leveraging language modality for better feature focus and latent space learning.
Findings
CMG outperforms baseline by 16.81% in anomaly detection.
CMER effectively masks irrelevant image parts based on text.
CMLE creates a semantically meaningful latent space for vision data.
Abstract
Recent years have seen a surge of interest in anomaly detection for tackling industrial defect detection, event detection, etc. However, existing unsupervised anomaly detectors, particularly those for the vision modality, face significant challenges due to redundant information and sparse latent space. Conversely, the language modality performs well due to its relatively single data. This paper tackles the aforementioned challenges for vision modality from a multimodal point of view. Specifically, we propose Cross-modal Guidance (CMG), which consists of Cross-modal Entropy Reduction (CMER) and Cross-modal Linear Embedding (CMLE), to tackle the redundant information issue and sparse space issue, respectively. CMER masks parts of the raw image and computes the matching score with the text. Then, CMER discards irrelevant pixels to make the detector focus on critical contents. To learn a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Data-Driven Disease Surveillance · Network Security and Intrusion Detection
MethodsFocus
