A Lightweight Clustering Framework for Unsupervised Semantic   Segmentation

Yau Shing Jonathan Cheung; Xi Chen; Lihe Yang; Hengshuang Zhao

arXiv:2311.18628·cs.CV·January 1, 2024·1 cites

A Lightweight Clustering Framework for Unsupervised Semantic Segmentation

Yau Shing Jonathan Cheung, Xi Chen, Lihe Yang, Hengshuang Zhao

PDF

Open Access

TL;DR

This paper introduces a lightweight clustering framework leveraging self-supervised Vision Transformer features for unsupervised semantic segmentation, achieving state-of-the-art results without neural network training.

Contribution

The authors propose a novel multilevel clustering approach that exploits attention features for effective unsupervised segmentation, reducing computational complexity.

Findings

01

Achieves state-of-the-art results on PASCAL VOC and MS COCO datasets.

02

Demonstrates strong foreground-background differentiation in self-supervised Vision Transformer features.

03

Provides comprehensive analysis comparing DINO and DINOv2 features.

Abstract

Unsupervised semantic segmentation aims to categorize each pixel in an image into a corresponding class without the use of annotated data. It is a widely researched area as obtaining labeled datasets is expensive. While previous works in the field have demonstrated a gradual improvement in model accuracy, most required neural network training. This made segmentation equally expensive, especially when dealing with large-scale datasets. We thus propose a lightweight clustering framework for unsupervised semantic segmentation. We discovered that attention features of the self-supervised Vision Transformer exhibit strong foreground-background differentiability. Therefore, clustering can be employed to effectively separate foreground and background image patches. In our framework, we first perform multilevel clustering across the Dataset-level, Category-level, and Image-level, and maintain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Byte Pair Encoding · Dropout · Label Smoothing · Adam · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Transformer · self-DIstillation with NO labels