TL;DR
Boxes2Pixels is a noise-robust framework that improves defect segmentation from noisy SAM pseudo-masks by treating SAM as a teacher and incorporating self-correction, leading to better accuracy and efficiency.
Contribution
It introduces a novel noise-robust distillation method that leverages SAM as a noisy teacher and employs self-correction for defect segmentation.
Findings
Improves anomaly mIoU by +6.97 over baseline.
Increases binary IoU by +9.71 over baseline.
Enhances binary recall by +18.56 with fewer parameters.
Abstract
Accurate defect segmentation is critical for industrial inspection, yet dense pixel-level annotations are rarely available. A common workaround is to convert inexpensive bounding boxes into pseudo-masks using foundation segmentation models such as the Segment Anything Model (SAM). However, these pseudo-labels are systematically noisy on industrial surfaces, often hallucinating background structure while missing sparse defects. To address this limitation, a noise-robust box-to-pixel distillation framework, Boxes2Pixels, is proposed that treats SAM as a noisy teacher rather than a source of ground-truth supervision. Bounding boxes are converted into pseudo-masks offline by SAM, and a compact student is trained with (i) a hierarchical decoder over frozen DINOv2 features for semantic stability, (ii) an auxiliary binary localization head to decouple sparse foreground discovery from class…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
