DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data

Yuanpeng Tu; Xi Chen; Ser-Nam Lim; Hengshuang Zhao

arXiv:2501.02048·cs.CV·May 29, 2025

DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data

Yuanpeng Tu, Xi Chen, Ser-Nam Lim, Hengshuang Zhao

PDF

Open Access

TL;DR

DreamMask introduces a data-centric approach for open-vocabulary panoptic segmentation by generating synthetic training data and aligning it with real data, significantly improving model generalization and performance.

Contribution

It presents a systematic data generation pipeline and a synthetic-real alignment loss, enhancing open-vocabulary segmentation without extensive manual data collection.

Findings

01

Outperforms previous state-of-the-art by 2.1% mIoU on ADE20K.

02

Synthetic data can surpass manually collected web data.

03

The approach simplifies large-scale data collection for segmentation tasks.

Abstract

Open-vocabulary panoptic segmentation has received significant attention due to its applicability in the real world. Despite claims of robust generalization, we find that the advancements of previous works are attributed mainly on trained categories, exposing a lack of generalization to novel classes. In this paper, we explore boosting existing models from a data-centric perspective. We propose DreamMask, which systematically explores how to generate training data in the open-vocabulary setting, and how to train the model with both real and synthetic data. For the first part, we propose an automatic data generation pipeline with off-the-shelf models. We propose crucial designs for vocabulary expansion, layout arrangement, data filtering, etc. Equipped with these techniques, our generated data could significantly outperform the manually collected web data. To train the model with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Advanced Image and Video Retrieval Techniques

MethodsSoftmax · Attention Is All You Need