Towards Open-Vocabulary Industrial Defect Understanding with a Large-Scale Multimodal Dataset
TsaiChing Ni, ZhenQi Chen, YuanFu Yang

TL;DR
This paper introduces IMDD-1M, a large-scale multimodal dataset for industrial defect understanding, and develops a diffusion-based vision-language model that achieves efficient, domain-adaptive defect detection and description with minimal task-specific data.
Contribution
The creation of IMDD-1M dataset and the development of a diffusion-based vision-language model tailored for industrial defect analysis, enabling efficient domain adaptation.
Findings
The dataset contains 1 million image-text pairs across 60 material categories.
The model achieves comparable performance with less than 5% of task-specific data.
Demonstrates effective application in classification, segmentation, retrieval, captioning, and generation.
Abstract
We present IMDD-1M, the first large-scale Industrial Multimodal Defect Dataset comprising 1,000,000 aligned image-text pairs, designed to advance multimodal learning for manufacturing and quality inspection. IMDD-1M contains high-resolution real-world defects spanning over 60 material categories and more than 400 defect types, each accompanied by expert-verified annotations and fine-grained textual descriptions detailing defect location, severity, and contextual attributes. This dataset enables a wide spectrum of applications, including classification, segmentation, retrieval, captioning, and generative modeling. Building upon IMDD-1M, we train a diffusion-based vision-language foundation model from scratch, specifically tailored for industrial scenarios. The model serves as a generalizable foundation that can be efficiently adapted to specialized domains through lightweight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Machine Learning in Materials Science · Advanced Neural Network Applications
