Learning Accurate Segmentation Purely from Self-Supervision
Zuyao You, Zuxuan Wu, Yu-Gang Jiang

TL;DR
This paper introduces Selfment, a fully self-supervised framework for object segmentation that achieves state-of-the-art results without manual labels, using affinity graphs, iterative refinement, and contrastive learning.
Contribution
Selfment is the first fully self-supervised method to produce accurate segmentation masks without human annotations or pretrained models, advancing unsupervised segmentation capabilities.
Findings
Sets new state-of-the-art on multiple benchmarks
Achieves high zero-shot generalization to camouflaged object detection
Outperforms existing unsupervised and rivals supervised methods
Abstract
Accurately segmenting objects without any manual annotations remains one of the core challenges in computer vision. In this work, we introduce Selfment, a fully self-supervised framework that segments foreground objects directly from raw images without human labels, pretrained segmentation models, or any post-processing. Selfment first constructs patch-level affinity graphs from self-supervised features and applies NCut to obtain an initial coarse foreground--background separation. We then introduce Iterative Patch Optimization (IPO), a feature-space refinement procedure that progressively enforces spatial coherence and semantic consistency through iterative patch clustering. The refined masks are subsequently used as supervisory signals to train a lightweight segmentation head with contrastive and region-consistency objectives, allowing the model to learn stable and transferable object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
