Open-Vocabulary Semantic Segmentation with Image Embedding Balancing

Xiangheng Shan; Dongyue Wu; Guilin Zhu; Yuanjie Shao; Nong Sang,; Changxin Gao

arXiv:2406.09829·cs.CV·June 17, 2024·1 cites

Open-Vocabulary Semantic Segmentation with Image Embedding Balancing

Xiangheng Shan, Dongyue Wu, Guilin Zhu, Yuanjie Shao, Nong Sang,, Changxin Gao

PDF

Open Access 1 Repo

TL;DR

This paper introduces EBSeg, a novel framework for open-vocabulary semantic segmentation that balances image embeddings and enforces semantic structure consistency, significantly improving generalization to new classes.

Contribution

The paper proposes EBSeg with an Adaptively Balanced Decoder and SSC Loss, enhancing CLIP-based segmentation by balancing embeddings and aligning semantic structures for better generalization.

Findings

01

Outperforms state-of-the-art methods on various benchmarks

02

Effectively balances training and new class recognition

03

Improves semantic structure understanding in segmentation tasks

Abstract

Open-vocabulary semantic segmentation is a challenging task, which requires the model to output semantic masks of an image beyond a close-set vocabulary. Although many efforts have been made to utilize powerful CLIP models to accomplish this task, they are still easily overfitting to training classes due to the natural gaps in semantic information between training and new classes. To overcome this challenge, we propose a novel framework for openvocabulary semantic segmentation called EBSeg, incorporating an Adaptively Balanced Decoder (AdaB Decoder) and a Semantic Structure Consistency loss (SSC Loss). The AdaB Decoder is designed to generate different image embeddings for both training and new classes. Subsequently, these two types of embeddings are adaptively balanced to fully exploit their ability to recognize training classes and generalization ability for new classes. To learn a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

slonetime/ebseg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsContrastive Language-Image Pre-training · Segment Anything Model