Anomaly Object Segmentation with Vision-Language Models for Steel Scrap Recycling

Daichi Tanaka; Takumi Karasawa; Shu Takenouchi; Rei Kawakami

arXiv:2506.13282·cs.CV·June 17, 2025

Anomaly Object Segmentation with Vision-Language Models for Steel Scrap Recycling

Daichi Tanaka, Takumi Karasawa, Shu Takenouchi, Rei Kawakami

PDF

Open Access

TL;DR

This paper introduces a vision-language model fine-tuned for fine-grained anomaly detection in steel scrap recycling, aiming to improve impurity identification and reduce CO2 emissions.

Contribution

It presents a novel supervised fine-tuning approach of a vision-language model with multi-scale and text prompts for anomaly detection in steel scrap.

Findings

01

Effective anomaly detection at a fine-grained level

02

Improved impurity identification accuracy

03

Potential reduction in CO2 emissions

Abstract

Recycling steel scrap can reduce carbon dioxide (CO2) emissions from the steel industry. However, a significant challenge in steel scrap recycling is the inclusion of impurities other than steel. To address this issue, we propose vision-language-model-based anomaly detection where a model is finetuned in a supervised manner, enabling it to handle niche objects effectively. This model enables automated detection of anomalies at a fine-grained level within steel scrap. Specifically, we finetune the image encoder, equipped with multi-scale mechanism and text prompts aligned with both normal and anomaly images. The finetuning process trains these modules using a multiclass classification as the supervision.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMineral Processing and Grinding · Industrial Vision Systems and Defect Detection