Exploring Transformers for Open-world Instance Segmentation
Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo

TL;DR
This paper introduces SWORD, a Transformer-based approach for open-world instance segmentation that effectively discovers novel objects and improves generalization across datasets by combining stop-gradient operations and contrastive learning.
Contribution
The paper proposes a novel Transformer framework with stop-gradient and contrastive learning for open-world segmentation, achieving state-of-the-art results and addressing both recall and precision.
Findings
Achieves 40.0% ARb100 on VOC to non-VOC
Outperforms previous models by 5.9% APm on COCO to UVO
Demonstrates effective discovery of unseen objects in open-world settings
Abstract
Open-world instance segmentation is a rising task, which aims to segment all objects in the image by learning from a limited number of base-category objects. This task is challenging, as the number of unseen categories could be hundreds of times larger than that of seen categories. Recently, the DETR-like models have been extensively studied in the closed world while stay unexplored in the open world. In this paper, we utilize the Transformer for open-world instance segmentation and present SWORD. Firstly, we introduce to attach the stop-gradient operation before classification head and further add IoU heads for discovering novel objects. We demonstrate that a simple stop-gradient operation not only prevents the novel objects from being suppressed as background, but also allows the network to enjoy the merit of heuristic label assignment. Secondly, we propose a novel contrastive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Linear Layer · Adam · Dense Connections · Residual Connection · Dropout · Contrastive Learning · Absolute Position Encodings
