ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation

Kehan Li; Zhennan Wang; Zesen Cheng; Runyi Yu; Yian Zhao; Guoli Song,; Chang Liu; Li Yuan; Jie Chen

arXiv:2210.05944·cs.CV·March 31, 2023·5 cites

ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation

Kehan Li, Zhennan Wang, Zesen Cheng, Runyi Yu, Yian Zhao, Guoli Song,, Chang Liu, Li Yuan, Jie Chen

PDF

Open Access

TL;DR

ACSeg introduces an adaptive, prototype-based approach for unsupervised semantic segmentation that dynamically generates image-specific concepts, significantly improving clustering accuracy in pixel-level semantic grouping.

Contribution

The paper proposes a novel adaptive conceptualization method with learnable prototypes and a modularity loss, enhancing unsupervised semantic segmentation performance.

Findings

01

Achieved state-of-the-art results on benchmark datasets.

02

Effectively handles semantic diversity across images.

03

Demonstrates robustness to scene complexity.

Abstract

Recently, self-supervised large-scale visual pre-training models have shown great promise in representing pixel-level semantic relationships, significantly promoting the development of unsupervised dense prediction tasks, e.g., unsupervised semantic segmentation (USS). The extracted relationship among pixel-level representations typically contains rich class-aware information that semantically identical pixel embeddings in the representation space gather together to form sophisticated concepts. However, leveraging the learned models to ascertain semantically consistent pixel groups or regions in the image is non-trivial since over/ under-clustering overwhelms the conceptualization procedure under various semantic distributions of different images. In this work, we investigate the pixel-level semantic aggregation in self-supervised ViT pre-trained models as image Segmentation and propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Absolute Position Encodings · Layer Normalization · Position-Wise Feed-Forward Layer · Residual Connection · Dropout · Adam