Unsupervised Semantic Segmentation by Distilling Feature Correspondences

Mark Hamilton; Zhoutong Zhang; Bharath Hariharan; Noah Snavely,; William T. Freeman

arXiv:2203.08414·cs.CV·March 17, 2022·114 cites

Unsupervised Semantic Segmentation by Distilling Feature Correspondences

Mark Hamilton, Zhoutong Zhang, Bharath Hariharan, Noah Snavely,, William T. Freeman

PDF

Open Access 3 Repos 1 Video

TL;DR

This paper introduces STEGO, a novel framework for unsupervised semantic segmentation that separates feature learning from clustering, using a contrastive loss to produce semantically meaningful pixel features, significantly improving state-of-the-art results.

Contribution

The paper proposes STEGO, a new framework that distills unsupervised features into discrete semantic labels with a novel contrastive loss, enhancing segmentation performance.

Findings

01

Achieves +14 mIoU on CocoStuff

02

Achieves +9 mIoU on Cityscapes

03

Outperforms previous state-of-the-art methods

Abstract

Unsupervised semantic segmentation aims to discover and localize semantically meaningful categories within image corpora without any form of annotation. To solve this task, algorithms must produce features for every pixel that are both semantically meaningful and compact enough to form distinct clusters. Unlike previous works which achieve this with a single end-to-end framework, we propose to separate feature learning from cluster compactification. Empirically, we show that current unsupervised feature learning frameworks already generate dense features whose correlations are semantically consistent. This observation motivates us to design STEGO ( $S$ elf-supervised $T$ ransformer with $E$ nergy-based $G$ raph $O$ ptimization), a novel framework that distills unsupervised features into high-quality discrete semantic labels. At the core of STEGO is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Unsupervised Semantic Segmentation by Distilling Feature Correspondences· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsVision Transformer · Linear Layer · Residual Connection · Dropout · Adam · Softmax · Multi-Head Attention · Layer Normalization · Attention Is All You Need · Transformer