TL;DR
PiCIE introduces an unsupervised semantic segmentation framework that leverages invariance and equivariance principles to learn high-level semantic concepts from uncurated, multi-label images without annotations.
Contribution
It extends clustering to pixel-level segmentation incorporating geometric consistency, enabling segmentation of both objects and scenes without hyperparameter tuning.
Findings
Outperforms baselines on COCO and Cityscapes with significant accuracy and mIoU improvements.
Learns high-level semantic concepts without annotations.
Provides better initialization for supervised training.
Abstract
We present a new framework for semantic segmentation without annotations via clustering. Off-the-shelf clustering methods are limited to curated, single-label, and object-centric images yet real-world data are dominantly uncurated, multi-label, and scene-centric. We extend clustering from images to pixels and assign separate cluster membership to different instances within each image. However, solely relying on pixel-wise feature similarity fails to learn high-level semantic concepts and overfits to low-level visual cues. We propose a method to incorporate geometric consistency as an inductive bias to learn invariance and equivariance for photometric and geometric variations. With our novel learning objective, our framework can learn high-level semantic concepts. Our method, PiCIE (Pixel-level feature Clustering using Invariance and Equivariance), is the first method capable of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
