Towards Training-free Open-world Segmentation via Image Prompt Foundation Models
Lv Tang, Peng-Tao Jiang, Hao-Ke Xiao, Bo Li

TL;DR
This paper introduces IPSeg, a training-free open-world segmentation method that uses image prompts with foundation models to efficiently segment objects without extensive training.
Contribution
It proposes a novel training-free approach leveraging image prompts and foundation models for open-world segmentation, eliminating the need for extensive training.
Findings
Effective segmentation on COCO and PASCAL VOC datasets
Eliminates training sessions for open-world segmentation
Utilizes image prompts to guide foundation models
Abstract
The realm of computer vision has witnessed a paradigm shift with the advent of foundational models, mirroring the transformative influence of large language models in the domain of natural language processing. This paper delves into the exploration of open-world segmentation, presenting a novel approach called Image Prompt Segmentation (IPSeg) that harnesses the power of vision foundational models. IPSeg lies the principle of a training-free paradigm, which capitalizes on image prompt techniques. Specifically, IPSeg utilizes a single image containing a subjective visual concept as a flexible prompt to query vision foundation models like DINOv2 and Stable Diffusion. Our approach extracts robust features for the prompt image and input image, then matches the input representations to the prompt representations via a novel feature interaction module to generate point prompts highlighting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsDiffusion
