MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation
Kaixin Cai, Pengzhen Ren, Yi Zhu, Hang Xu, Jianzhuang Liu, Changlin, Li, Guangrun Wang, Xiaodan Liang

TL;DR
MixReorg is a novel pre-training method that improves open-world semantic segmentation by enhancing pixel-level semantic alignment through patch mixing and contrastive learning, enabling zero-shot object segmentation.
Contribution
The paper introduces MixReorg, a simple yet effective patch reorganization pre-training paradigm that significantly boosts zero-shot semantic segmentation performance without fine-tuning.
Findings
Outperforms GroupViT by 2.5-6.2% mIoU on various benchmarks.
Enables direct application to segment arbitrary categories in a zero-shot manner.
Enhances pixel-semantic alignment in text-supervised models.
Abstract
Recently, semantic segmentation models trained with image-level text supervision have shown promising results in challenging open-world scenarios. However, these models still face difficulties in learning fine-grained semantic alignment at the pixel level and predicting accurate object masks. To address this issue, we propose MixReorg, a novel and straightforward pre-training paradigm for semantic segmentation that enhances a model's ability to reorganize patches mixed across images, exploring both local visual relevance and global semantic coherence. Our approach involves generating fine-grained patch-text pairs data by mixing image patches while preserving the correspondence between patches and text. The model is then trained to minimize the segmentation loss of the mixed images and the two contrastive losses of the original and restored features. With MixReorg as a mask learner,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
