Analogist: Out-of-the-box Visual In-Context Learning with Image   Diffusion Model

Zheng Gu; Shiyuan Yang; Jing Liao; Jing Huo; Yang Gao

arXiv:2405.10316·cs.CV·May 17, 2024

Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model

Zheng Gu, Shiyuan Yang, Jing Liao, Jing Huo, Yang Gao

PDF

Open Access

TL;DR

Analogist introduces a novel inference-based visual in-context learning method that combines visual and textual prompts using a pretrained image diffusion model, enabling flexible, out-of-the-box task generalization without fine-tuning.

Contribution

It proposes a new approach that leverages visual and textual prompts with a pretrained diffusion model, enhancing visual ICL without additional training.

Findings

01

Outperforms existing visual ICL methods qualitatively and quantitatively

02

Uses self-attention cloning for structural analogy in images

03

Employs GPT-4V for efficient text prompt generation

Abstract

Visual In-Context Learning (ICL) has emerged as a promising research area due to its capability to accomplish various tasks with limited example pairs through analogical reasoning. However, training-based visual ICL has limitations in its ability to generalize to unseen tasks and requires the collection of a diverse task dataset. On the other hand, existing methods in the inference-based visual ICL category solely rely on textual prompts, which fail to capture fine-grained contextual information from given examples and can be time-consuming when converting from images to text prompts. To address these challenges, we propose Analogist, a novel inference-based visual ICL approach that exploits both visual and textual prompting techniques using a text-to-image diffusion model pretrained for image inpainting. For visual prompting, we propose a self-attention cloning (SAC) method to guide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsDiffusion