TL;DR
SteelDefectX introduces a comprehensive vision-language dataset and benchmark for steel surface defect analysis, enabling more nuanced semantic understanding and evaluation of vision-language models in industrial settings.
Contribution
It provides a multi-form textual annotation dataset with diverse defect descriptions and establishes a benchmark for various vision-language tasks in steel defect analysis.
Findings
Structured attributes yield stable semantic alignment.
Natural language descriptions enhance transferability.
Textual representation design impacts model performance.
Abstract
Steel surface defect analysis is critical for industrial quality control, yet existing benchmarks rely primarily on label-only annotations, limiting fine-grained semantic understanding and systematic evaluation of vision-language models. To address this gap, we introduce SteelDefectX, a vision-language dataset with multi-form textual annotations for steel surface defect analysis, comprising 7,778 images across 25 defect categories. At the class level, the dataset provides defect names, representative visual attributes, and industrial causes. At the sample level, each image is annotated with three forms of textual representations: (1) free-form natural language descriptions, (2) structured attribute annotations, and (3) template-based sentences. These annotations provide flexible textual supervision with varying levels of expressiveness and controllability. We further establish a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
