Unleashing In-context Learning of Autoregressive Models for Few-shot   Image Manipulation

Bolin Lai; Felix Juefei-Xu; Miao Liu; Xiaoliang Dai; Nikhil Mehta,; Chenguang Zhu; Zeyi Huang; James M. Rehg; Sangmin Lee; Ning Zhang; Tong Xiao

arXiv:2412.01027·cs.CV·December 4, 2024

Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation

Bolin Lai, Felix Juefei-Xu, Miao Liu, Xiaoliang Dai, Nikhil Mehta,, Chenguang Zhu, Zeyi Huang, James M. Rehg, Sangmin Lee, Ning Zhang, Tong Xiao

PDF

Open Access

TL;DR

This paper introduces InstaManip, a multi-modal autoregressive model that enables rapid, few-shot image manipulation guided by text and visual examples, overcoming diffusion models' reasoning limitations.

Contribution

The paper proposes a novel group self-attention mechanism and relation regularization to improve few-shot image manipulation with in-context learning.

Findings

01

Outperforms previous models by at least 19% in human evaluation

02

Model's performance improves with more diverse exemplars

03

Effective disentanglement of image features achieved

Abstract

Text-guided image manipulation has experienced notable advancement in recent years. In order to mitigate linguistic ambiguity, few-shot learning with visual examples has been applied for instructions that are underrepresented in the training set, or difficult to describe purely in language. However, learning from visual prompts requires strong reasoning capability, which diffusion models are struggling with. To address this issue, we introduce a novel multi-modal autoregressive model, dubbed $InstaManip$ , that can $insta$ ntly learn a new image $manip$ ulation operation from textual and visual guidance via in-context learning, and apply it to new query images. Specifically, we propose an innovative group self-attention mechanism to break down the in-context learning process into two separate stages -- learning and applying, which simplifies the complex problem…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · Advanced Image Processing Techniques · Advanced Vision and Imaging

MethodsDiffusion