Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic   Segmentation

Fei Zhang; Tianfei Zhou; Boyang Li; Hao He; Chaofan Ma; Tianjiao; Zhang; Jiangchao Yao; Ya Zhang; Yanfeng Wang

arXiv:2310.19001·cs.CV·October 31, 2023·5 cites

Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation

Fei Zhang, Tianfei Zhou, Boyang Li, Hao He, Chaofan Ma, Tianjiao, Zhang, Jiangchao Yao, Ya Zhang, Yanfeng Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel approach for weakly open-vocabulary semantic segmentation by using explicit prototypical supervision to improve group token alignment, leading to more accurate and comprehensive segmentation results.

Contribution

It proposes the non-learnable prototypical regularization (NPR) and the PGSeg network, which leverage prototypical knowledge from images and texts to enhance segmentation performance.

Findings

01

Achieves state-of-the-art results on benchmark datasets.

02

Effectively captures diverse semantic regions with less redundancy.

03

Improves group token alignment through prototypical supervision.

Abstract

This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS), which learns to segment objects of arbitrary classes using mere image-text pairs. Existing works turn to enhance the vanilla vision transformer by introducing explicit grouping recognition, i.e., employing several group tokens/centroids to cluster the image tokens and perform the group-text alignment. Nevertheless, these methods suffer from a granularity inconsistency regarding the usage of group tokens, which are aligned in the all-to-one v.s. one-to-one manners during the training and inference phases, respectively. We argue that this discrepancy arises from the lack of elaborate supervision for each group token. To bridge this granularity gap, this paper explores explicit supervision for the group tokens from the prototypical knowledge. To this end, this paper proposes the non-learnable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ferenas/pgseg
pytorch

Videos

Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Linear Layer · Residual Connection · Dense Connections · Layer Normalization · Vision Transformer