Open-vocabulary Panoptic Segmentation with Embedding Modulation

Xi Chen; Shuang Li; Ser-Nam Lim; Antonio Torralba; Hengshuang Zhao

arXiv:2303.11324·cs.CV·July 18, 2023·1 cites

Open-vocabulary Panoptic Segmentation with Embedding Modulation

Xi Chen, Shuang Li, Ser-Nam Lim, Antonio Torralba, Hengshuang Zhao

PDF

Open Access

TL;DR

This paper introduces OPSNet, a data-efficient framework for open-vocabulary panoptic segmentation that leverages embedding modulation and CLIP to achieve state-of-the-art results across multiple datasets.

Contribution

The paper proposes OPSNet with an Embedding Modulation module, enabling effective open- and closed-vocabulary segmentation with less data and improved performance.

Findings

01

Achieves state-of-the-art results on multiple datasets

02

Effective embedding enhancement via Embedding Modulation

03

Superior performance with fewer additional data requirements

Abstract

Open-vocabulary image segmentation is attracting increasing attention due to its critical applications in the real world. Traditional closed-vocabulary segmentation methods are not able to characterize novel objects, whereas several recent open-vocabulary attempts obtain unsatisfactory results, i.e., notable performance reduction on the closed vocabulary and massive demand for extra data. To this end, we propose OPSNet, an omnipotent and data-efficient framework for Open-vocabulary Panoptic Segmentation. Specifically, the exquisitely designed Embedding Modulation module, together with several meticulous components, enables adequate embedding enhancement and information exchange between the segmentation model and the visual-linguistic well-aligned CLIP encoder, resulting in superior segmentation performance under both open- and closed-vocabulary settings with much fewer need of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques

MethodsContrastive Language-Image Pre-training