MagicSeg: Open-World Segmentation Pretraining via Counterfactural Diffusion-Based Auto-Generation

Kaixin Cai; Pengzhen Ren; Jianhua Han; Yi Zhu; Hang Xu; Jianzhuang Liu; Xiaodan Liang

arXiv:2603.19575·cs.CV·March 24, 2026

MagicSeg: Open-World Segmentation Pretraining via Counterfactural Diffusion-Based Auto-Generation

Kaixin Cai, Pengzhen Ren, Jianhua Han, Yi Zhu, Hang Xu, Jianzhuang Liu, Xiaodan Liang

PDF

Open Access

TL;DR

MagicSeg introduces a diffusion model-based pipeline for automatically generating high-quality datasets with counterfactual samples, significantly improving open-world semantic segmentation performance without extensive manual annotation.

Contribution

It presents a novel diffusion model-driven dataset generation method that includes negative samples for contrastive training, enhancing open-world segmentation pretraining.

Findings

01

Achieves state-of-the-art results on PASCAL VOC, PASCAL Context, and COCO datasets.

02

Effectively generates high-fidelity images and precise masks from class labels.

03

Demonstrates the benefit of counterfactual samples in contrastive learning for segmentation.

Abstract

Open-world semantic segmentation presently relies significantly on extensive image-text pair datasets, which often suffer from a lack of fine-grained pixel annotations on sufficient categories. The acquisition of such data is rendered economically prohibitive due to the substantial investments of both human labor and time. In light of the formidable image generation capabilities of diffusion models, we introduce a novel diffusion model-driven pipeline for automatically generating datasets tailored to the needs of open-world semantic segmentation, named "MagicSeg". Our MagicSeg initiates from class labels and proceeds to generate high-fidelity textual descriptions, which in turn serve as guidance for the diffusion model to generate images. Rather than only generating positive samples for each label, our process encompasses the simultaneous generation of corresponding negative images,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis