MagicSeg: Open-World Segmentation Pretraining via Counterfactural Diffusion-Based Auto-Generation
Kaixin Cai, Pengzhen Ren, Jianhua Han, Yi Zhu, Hang Xu, Jianzhuang Liu, Xiaodan Liang

TL;DR
MagicSeg introduces a diffusion model-based pipeline for automatically generating high-quality datasets with counterfactual samples, significantly improving open-world semantic segmentation performance without extensive manual annotation.
Contribution
It presents a novel diffusion model-driven dataset generation method that includes negative samples for contrastive training, enhancing open-world segmentation pretraining.
Findings
Achieves state-of-the-art results on PASCAL VOC, PASCAL Context, and COCO datasets.
Effectively generates high-fidelity images and precise masks from class labels.
Demonstrates the benefit of counterfactual samples in contrastive learning for segmentation.
Abstract
Open-world semantic segmentation presently relies significantly on extensive image-text pair datasets, which often suffer from a lack of fine-grained pixel annotations on sufficient categories. The acquisition of such data is rendered economically prohibitive due to the substantial investments of both human labor and time. In light of the formidable image generation capabilities of diffusion models, we introduce a novel diffusion model-driven pipeline for automatically generating datasets tailored to the needs of open-world semantic segmentation, named "MagicSeg". Our MagicSeg initiates from class labels and proceeds to generate high-fidelity textual descriptions, which in turn serve as guidance for the diffusion model to generate images. Rather than only generating positive samples for each label, our process encompasses the simultaneous generation of corresponding negative images,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
