Selective Masking based Self-Supervised Learning for Image Semantic Segmentation
Yuemin Wang, Ian Stavness

TL;DR
This paper introduces a selective masking self-supervised learning approach for image semantic segmentation that outperforms traditional methods by focusing on high-loss patches, improving accuracy especially for low-performing classes.
Contribution
The paper presents a novel selective masking strategy that iteratively masks high-loss patches, enhancing self-supervised pretraining for semantic segmentation tasks.
Findings
Outperforms random masking and supervised pretraining on multiple datasets.
Significantly improves accuracy for low-performing classes.
Effective for low-resource pretraining scenarios.
Abstract
This paper proposes a novel self-supervised learning method for semantic segmentation using selective masking image reconstruction as the pretraining task. Our proposed method replaces the random masking augmentation used in most masked image modelling pretraining methods. The proposed selective masking method selectively masks image patches with the highest reconstruction loss by breaking the image reconstruction pretraining into iterative steps to leverage the trained model's knowledge. We show on two general datasets (Pascal VOC and Cityscapes) and two weed segmentation datasets (Nassar 2020 and Sugarbeets 2016) that our proposed selective masking method outperforms the traditional random masking method and supervised ImageNet pretraining on downstream segmentation accuracy by 2.9% for general datasets and 2.5% for weed segmentation datasets. Furthermore, we found that our selective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
