Sanitizing Manufacturing Dataset Labels Using Vision-Language Models
Nazanin Mahjourian, Vinh Nguyen

TL;DR
This paper presents VLSR, a vision-language framework that uses CLIP embeddings to identify and correct noisy labels in manufacturing datasets, improving data quality for machine learning.
Contribution
It introduces a novel vision-language-based approach for label sanitization and refinement in manufacturing datasets, reducing label noise and improving consistency.
Findings
Effective identification of irrelevant and misspelled labels
Significant reduction in label vocabulary size
Enhanced dataset quality for industrial ML applications
Abstract
The success of machine learning models in industrial applications is heavily dependent on the quality of the datasets used to train the models. However, large-scale datasets, specially those constructed from crowd-sourcing and web-scraping, often suffer from label noise, inconsistencies, and errors. This problem is particularly pronounced in manufacturing domains, where obtaining high-quality labels is costly and time-consuming. This paper introduces Vision-Language Sanitization and Refinement (VLSR), which is a vision-language-based framework for label sanitization and refinement in multi-label manufacturing image datasets. This method embeds both images and their associated textual labels into a shared semantic space leveraging the CLIP vision-language model. Then two key tasks are addressed in this process by computing the cosine similarity between embeddings. First, label…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Text and Document Classification Technologies · Domain Adaptation and Few-Shot Learning
