Towards Training-free Open-world Segmentation via Image Prompt   Foundation Models

Lv Tang; Peng-Tao Jiang; Hao-Ke Xiao; Bo Li

arXiv:2310.10912·cs.CV·June 27, 2024·2 cites

Towards Training-free Open-world Segmentation via Image Prompt Foundation Models

Lv Tang, Peng-Tao Jiang, Hao-Ke Xiao, Bo Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces IPSeg, a training-free open-world segmentation method that uses image prompts with foundation models to efficiently segment objects without extensive training.

Contribution

It proposes a novel training-free approach leveraging image prompts and foundation models for open-world segmentation, eliminating the need for extensive training.

Findings

01

Effective segmentation on COCO and PASCAL VOC datasets

02

Eliminates training sessions for open-world segmentation

03

Utilizes image prompts to guide foundation models

Abstract

The realm of computer vision has witnessed a paradigm shift with the advent of foundational models, mirroring the transformative influence of large language models in the domain of natural language processing. This paper delves into the exploration of open-world segmentation, presenting a novel approach called Image Prompt Segmentation (IPSeg) that harnesses the power of vision foundational models. IPSeg lies the principle of a training-free paradigm, which capitalizes on image prompt techniques. Specifically, IPSeg utilizes a single image containing a subjective visual concept as a flexible prompt to query vision foundation models like DINOv2 and Stable Diffusion. Our approach extracts robust features for the prompt image and input image, then matches the input representations to the prompt representations via a novel feature interaction module to generate point prompts highlighting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luckybird1994/ipseg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsDiffusion