Vision-Language In-Context Learning Driven Few-Shot Visual Inspection   Model

Shiryu Ueno; Yoshikazu Hayashi; Shunsuke Nakatsuka; Yusei Yamada,; Hiroaki Aizawa; Kunihito Kato

arXiv:2502.09057·cs.CV·February 14, 2025

Vision-Language In-Context Learning Driven Few-Shot Visual Inspection Model

Shiryu Ueno, Yoshikazu Hayashi, Shunsuke Nakatsuka, Yusei Yamada,, Hiroaki Aizawa, Kunihito Kato

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel few-shot visual inspection method using Vision-Language Models with in-context learning, enabling high-performance defect detection without extensive retraining for new products.

Contribution

It presents a fine-tuned VLM with in-context learning for visual inspection, reducing the need for large datasets and retraining for each new product.

Findings

01

Achieved MCC of 0.804 on MVTec AD in one-shot setting.

02

F1-score of 0.950 demonstrating high defect detection accuracy.

03

Eliminated the need for extensive retraining for new inspection tasks.

Abstract

We propose general visual inspection model using Vision-Language Model~(VLM) with few-shot images of non-defective or defective products, along with explanatory texts that serve as inspection criteria. Although existing VLM exhibit high performance across various tasks, they are not trained on specific tasks such as visual inspection. Thus, we construct a dataset consisting of diverse images of non-defective and defective products collected from the web, along with unified formatted output text, and fine-tune VLM. For new products, our method employs In-Context Learning, which allows the model to perform inspections with an example of non-defective or defective image and the corresponding explanatory texts with visual prompts. This approach eliminates the need to collect a large number of training samples and re-train the model for each product. The experimental results show that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ia-gu/vision-language-in-context-learning-driven-few-shot-visual-inspection-model
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Infrastructure Maintenance and Monitoring · Multimodal Machine Learning Applications