In-Context Brush: Zero-shot Customized Subject Insertion with Context-Aware Latent Space Manipulation

Yu Xu; Fan Tang; You Wu; Lin Gao; Oliver Deussen; Hongbin Yan; Jintao Li; Juan Cao; Tong-Yee Lee

arXiv:2505.20271·cs.CV·May 27, 2025

In-Context Brush: Zero-shot Customized Subject Insertion with Context-Aware Latent Space Manipulation

Yu Xu, Fan Tang, You Wu, Lin Gao, Oliver Deussen, Hongbin Yan, Jintao Li, Juan Cao, Tong-Yee Lee

PDF

Open Access

TL;DR

In-Context Brush introduces a zero-shot, context-aware latent space manipulation framework for high-fidelity, user-aligned subject insertion in images guided solely by textual prompts, without additional training.

Contribution

The paper proposes a novel zero-shot method using in-context learning and latent space manipulation for customized subject insertion in images, eliminating the need for model fine-tuning.

Findings

01

Achieves superior identity preservation and text alignment.

02

Outperforms state-of-the-art methods in image quality.

03

Operates without additional training or data collection.

Abstract

Recent advances in diffusion models have enhanced multimodal-guided visual generation, enabling customized subject insertion that seamlessly "brushes" user-specified objects into a given image guided by textual prompts. However, existing methods often struggle to insert customized subjects with high fidelity and align results with the user's intent through textual prompts. In this work, we propose "In-Context Brush", a zero-shot framework for customized subject insertion by reformulating the task within the paradigm of in-context learning. Without loss of generality, we formulate the object image and the textual prompts as cross-modal demonstrations, and the target image with the masked region as the query. The goal is to inpaint the target image with the subject aligning textual prompts without model tuning. Building upon a pretrained MMDiT-based inpainting network, we perform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition

MethodsSoftmax · Attention Is All You Need · Inpainting · Diffusion · ALIGN