The devil is in the object boundary: towards annotation-free instance   segmentation using Foundation Models

Cheng Shi; Sibei Yang

arXiv:2404.11957·cs.CV·April 19, 2024·1 cites

The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models

Cheng Shi, Sibei Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Zip, a novel method that combines CLIP and SAM models to enable annotation-free, open-vocabulary instance segmentation and object detection, significantly improving performance without human annotations.

Contribution

The paper proposes Zip, a new pipeline that leverages CLIP's boundary prior to enhance SAM for annotation-free, open-vocabulary segmentation and detection, achieving state-of-the-art results.

Findings

01

Zip boosts SAM's mask AP on COCO by 12.5%.

02

Zip achieves comparable performance to annotation-based methods.

03

Zip enables training-free and label-efficient segmentation.

Abstract

Foundation models, pre-trained on a large amount of data have demonstrated impressive zero-shot capabilities in various downstream tasks. However, in object detection and instance segmentation, two fundamental computer vision tasks heavily reliant on extensive human annotations, foundation models such as SAM and DINO struggle to achieve satisfactory performance. In this study, we reveal that the devil is in the object boundary, \textit{i.e.}, these foundation models fail to discern boundaries between individual objects. For the first time, we probe that CLIP, which has never accessed any instance-level annotations, can provide a highly beneficial and strong instance-level boundary prior in the clustering results of its particular intermediate layer. Following this surprising observation, we propose $Zip$ which $Z$ ips up CL $ip$ and SAM in a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chengshiest/zip-your-clip
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Image Processing and 3D Reconstruction

MethodsDense Connections · Residual Connection · Softmax · Attention Is All You Need · Layer Normalization · Linear Layer · Multi-Head Attention · Vision Transformer · self-DIstillation with NO labels · Balanced Selection