Flexible visual prompts for in-context learning in computer vision

Thomas Foster; Ioana Croitoru; Robert Dorfman; Christoffer Edlund,; Thomas Varsavsky; Jon Almaz\'an

arXiv:2312.06592·cs.CV·December 12, 2023·1 cites

Flexible visual prompts for in-context learning in computer vision

Thomas Foster, Ioana Croitoru, Robert Dorfman, Christoffer Edlund,, Thomas Varsavsky, Jon Almaz\'an

PDF

Open Access 1 Repo

TL;DR

This paper introduces a flexible visual prompt approach for in-context learning in image segmentation, adapting Video Object Segmentation techniques to improve performance on diverse datasets and unseen classes without extra training.

Contribution

It presents a novel adaptation of VOS methods for visual in-context learning and introduces a support set selection technique that enhances performance without additional training.

Findings

01

Outperforms existing segmentation methods across datasets

02

Excels with classes not seen during training

03

Support set selection improves results without extra training

Abstract

In this work, we address in-context learning (ICL) for the task of image segmentation, introducing a novel approach that adapts a modern Video Object Segmentation (VOS) technique for visual in-context learning. This adaptation is inspired by the VOS method's ability to efficiently and flexibly learn objects from a few examples. Through evaluations across a range of support set sizes and on diverse segmentation datasets, our method consistently surpasses existing techniques. Notably, it excels with data containing classes not encountered during training. Additionally, we propose a technique for support set selection, which involves choosing the most relevant images to include in this set. By employing support set selection, the performance increases for all tested methods without the need for additional training or prompt tuning. The code can be found at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

v7labs/xmem_icl
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications

MethodsSparse Evolutionary Training · VOS