Boxes2Pixels: Learning Defect Segmentation from Noisy SAM Masks

Camile Lendering; Erkut Akdag; Egor Bondarev

arXiv:2604.11162·cs.CV·April 14, 2026

Boxes2Pixels: Learning Defect Segmentation from Noisy SAM Masks

Camile Lendering, Erkut Akdag, Egor Bondarev

PDF

1 Repo

TL;DR

Boxes2Pixels is a noise-robust framework that improves defect segmentation from noisy SAM pseudo-masks by treating SAM as a teacher and incorporating self-correction, leading to better accuracy and efficiency.

Contribution

It introduces a novel noise-robust distillation method that leverages SAM as a noisy teacher and employs self-correction for defect segmentation.

Findings

01

Improves anomaly mIoU by +6.97 over baseline.

02

Increases binary IoU by +9.71 over baseline.

03

Enhances binary recall by +18.56 with fewer parameters.

Abstract

Accurate defect segmentation is critical for industrial inspection, yet dense pixel-level annotations are rarely available. A common workaround is to convert inexpensive bounding boxes into pseudo-masks using foundation segmentation models such as the Segment Anything Model (SAM). However, these pseudo-labels are systematically noisy on industrial surfaces, often hallucinating background structure while missing sparse defects. To address this limitation, a noise-robust box-to-pixel distillation framework, Boxes2Pixels, is proposed that treats SAM as a noisy teacher rather than a source of ground-truth supervision. Bounding boxes are converted into pseudo-masks offline by SAM, and a compact student is trained with (i) a hierarchical decoder over frozen DINOv2 features for semantic stability, (ii) an auxiliary binary localization head to decouple sparse foreground discovery from class…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CLendering/Boxes2Pixels
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.