Insert In Style: A Zero-Shot Generative Framework for Harmonious Cross-Domain Object Composition

Raghu Vamsi Chittersu; Yuvraj Singh Rathore; Pranav Adlinge; Kunal Swami

arXiv:2511.15197·cs.CV·April 28, 2026

Insert In Style: A Zero-Shot Generative Framework for Harmonious Cross-Domain Object Composition

Raghu Vamsi Chittersu, Yuvraj Singh Rathore, Pranav Adlinge, Kunal Swami

PDF

TL;DR

Insert In Style is a novel zero-shot generative framework for cross-domain object composition that achieves high fidelity and style harmony without online finetuning, using a multi-stage training protocol and specialized architecture.

Contribution

The paper introduces a unified zero-shot framework with disentangled representations and a masked-attention architecture for harmonious cross-domain object composition.

Findings

01

Outperforms existing methods on identity and style metrics.

02

Achieves state-of-the-art results in high-fidelity stylized composition.

03

Validated by user studies confirming improved performance.

Abstract

Reference-based object composition involves integrating foreground reference image with background scene to produce harmonious fused image. This task becomes particularly challenging in cross-domain scenarios, where models must balance preserving the reference object's identity while harmonizing them to match stylized environments. This under-explored problem is currently split between practical "blenders" that lack generative fidelity and "generators" that require impractical, per-subject online finetuning. In this work, we introduce Insert In Style, the first zero-shot generative framework that is both practical and high-fidelity. Our core contribution is a unified framework with two key innovations: (i) a novel multi-stage training protocol that disentangles representations for identity, style, and composition, and (ii) a specialized masked-attention architecture that surgically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.