Leveraging Hidden Positives for Unsupervised Semantic Segmentation
Hyun Seok Seong, WonJun Moon, SuBeen Lee, Jae-Pil Heo

TL;DR
This paper introduces a novel unsupervised semantic segmentation method that leverages hidden positive samples through contrastive learning, improving local semantic consistency and achieving state-of-the-art results on multiple datasets.
Contribution
It proposes a new approach to utilize hidden positives and a gradient propagation strategy for enhanced semantic segmentation without supervision.
Findings
Achieves state-of-the-art performance on COCO-stuff, Cityscapes, and Potsdam-3 datasets.
Effectively captures task-specific semantic features through gradual contribution increase.
Ensures local semantic consistency via a novel gradient propagation strategy.
Abstract
Dramatic demand for manpower to label pixel-level annotations triggered the advent of unsupervised semantic segmentation. Although the recent work employing the vision transformer (ViT) backbone shows exceptional performance, there is still a lack of consideration for task-specific training guidance and local semantic consistency. To tackle these issues, we leverage contrastive learning by excavating hidden positives to learn rich semantic relationships and ensure semantic consistency in local regions. Specifically, we first discover two types of global hidden positives, task-agnostic and task-specific ones for each anchor based on the feature similarities defined by a fixed pre-trained backbone and a segmentation head-in-training, respectively. A gradual increase in the contribution of the latter induces the model to capture task-specific semantic features. In addition, we introduce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Layer Normalization · Softmax · Contrastive Learning · Residual Connection · Vision Transformer
