Image Segmentation Using Text and Image Prompts
Timo L\"uddecke, Alexander S. Ecker

TL;DR
This paper introduces a unified image segmentation system that uses text and image prompts at test time, enabling flexible, multi-task segmentation without retraining for new classes or complex queries.
Contribution
It presents a novel model extending CLIP with a transformer decoder to perform diverse segmentation tasks using arbitrary prompts, trained once on an extended dataset.
Findings
Effective in referring expression, zero-shot, and one-shot segmentation
Handles arbitrary binary segmentation queries with text or image prompts
Shows good adaptation to generalized queries involving properties
Abstract
Image segmentation is usually addressed by training a model for a fixed set of object classes. Incorporating additional classes or more complex queries later is expensive as it requires re-training the model on a dataset that encompasses these expressions. Here we propose a system that can generate image segmentations based on arbitrary prompts at test time. A prompt can be either a text or an image. This approach enables us to create a unified model (trained once) for three common segmentation tasks, which come with distinct challenges: referring expression segmentation, zero-shot segmentation and one-shot segmentation. We build upon the CLIP model as a backbone which we extend with a transformer-based decoder that enables dense prediction. After training on an extended version of the PhraseCut dataset, our system generates a binary segmentation map for an image based on a free-text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗CIDAS/clipseg-rd64-refinedmodel· 992k dl· ♡ 137992k dl♡ 137
- 🤗CIDAS/clipseg-rd64model· 354 dl· ♡ 3354 dl♡ 3
- 🤗CIDAS/clipseg-rd16model· 91 dl91 dl
- 🤗An-619/FastSAMmodel· ♡ 60♡ 60
- 🤗CatFlowerGames/clipseg-rd64-refined-catmodel· 7 dl7 dl
- 🤗csdl/clipseg-rd64-refined-with-handlermodel· 6 dl6 dl
- 🤗EyeJack/fastsam-endpointmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
