TAG: Guidance-free Open-Vocabulary Semantic Segmentation

Yasufumi Kawano; Yoshimitsu Aoki

arXiv:2403.11197·cs.CV·March 19, 2024·1 cites

TAG: Guidance-free Open-Vocabulary Semantic Segmentation

Yasufumi Kawano, Yoshimitsu Aoki

PDF

Open Access 1 Repo

TL;DR

The paper introduces TAG, a novel guidance-free open-vocabulary semantic segmentation method that leverages pre-trained models to segment images without additional training or dense annotations, achieving state-of-the-art results.

Contribution

TAG is the first approach to perform open-vocabulary segmentation without guidance or training, using pre-trained models and external class label retrieval.

Findings

01

Achieves +15.3 mIoU improvement on PascalVOC

02

State-of-the-art results on PascalContext and ADE20K

03

Operates without class name guidance or additional training

Abstract

Semantic segmentation is a crucial task in computer vision, where each pixel in an image is classified into a category. However, traditional methods face significant challenges, including the need for pixel-level annotations and extensive training. Furthermore, because supervised learning uses a limited set of predefined categories, models typically struggle with rare classes and cannot recognize new ones. Unsupervised and open-vocabulary segmentation, proposed to tackle these issues, faces challenges, including the inability to assign specific class labels to clusters and the necessity of user-provided text queries for guidance. In this context, we propose a novel approach, TAG which achieves Training, Annotation, and Guidance-free open-vocabulary semantic segmentation. TAG utilizes pre-trained models such as CLIP and DINO to segment images into meaningful categories without additional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

valkyrja3607/tag
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsAttention Is All You Need · Sparse Evolutionary Training · Linear Layer · Dense Connections · Softmax · Layer Normalization · Multi-Head Attention · Residual Connection · Vision Transformer · self-DIstillation with NO labels