Customizing Visual-Language Foundation Models for Multi-modal Anomaly Detection and Reasoning

Xiaohao Xu; Yunkang Cao; Huaxin Zhang; Nong Sang; Xiaonan Huang

arXiv:2403.11083·cs.CV·May 21, 2025·1 cites

Customizing Visual-Language Foundation Models for Multi-modal Anomaly Detection and Reasoning

Xiaohao Xu, Yunkang Cao, Huaxin Zhang, Nong Sang, Xiaonan Huang

PDF

Open Access 1 Repo

TL;DR

This paper presents a generic multi-modal anomaly detection framework that leverages visual-language foundation models with domain knowledge prompts, enabling robust detection and reasoning across diverse industrial scenarios and data modalities.

Contribution

It introduces a multi-modal prompting strategy and unified input representation to customize foundation models for versatile anomaly detection and reasoning tasks.

Findings

01

Enhanced anomaly detection performance with visual and language prompts

02

Effective detection across images, point clouds, and videos

03

Demonstrated capabilities in multi-object and temporal data scenarios

Abstract

Anomaly detection is vital in various industrial scenarios, including the identification of unusual patterns in production lines and the detection of manufacturing defects for quality control. Existing techniques tend to be specialized in individual scenarios and lack generalization capacities. In this study, our objective is to develop a generic anomaly detection model that can be applied in multiple scenarios. To achieve this, we custom-build generic visual language foundation models that possess extensive knowledge and robust reasoning abilities as anomaly detectors and reasoners. Specifically, we introduce a multi-modal prompting strategy that incorporates domain knowledge from experts as conditions to guide the models. Our approach considers diverse prompt types, including task descriptions, class context, normality rules, and reference images. In addition, we unify the input…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiaohao-xu/customizable-vlm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Anomaly Detection Techniques and Applications