Insert In Style: A Zero-Shot Generative Framework for Harmonious Cross-Domain Object Composition
Raghu Vamsi Chittersu, Yuvraj Singh Rathore, Pranav Adlinge, Kunal Swami

TL;DR
Insert In Style is a novel zero-shot generative framework for cross-domain object composition that achieves high fidelity and style harmony without online finetuning, using a multi-stage training protocol and specialized architecture.
Contribution
The paper introduces a unified zero-shot framework with disentangled representations and a masked-attention architecture for harmonious cross-domain object composition.
Findings
Outperforms existing methods on identity and style metrics.
Achieves state-of-the-art results in high-fidelity stylized composition.
Validated by user studies confirming improved performance.
Abstract
Reference-based object composition involves integrating foreground reference image with background scene to produce harmonious fused image. This task becomes particularly challenging in cross-domain scenarios, where models must balance preserving the reference object's identity while harmonizing them to match stylized environments. This under-explored problem is currently split between practical "blenders" that lack generative fidelity and "generators" that require impractical, per-subject online finetuning. In this work, we introduce Insert In Style, the first zero-shot generative framework that is both practical and high-fidelity. Our core contribution is a unified framework with two key innovations: (i) a novel multi-stage training protocol that disentangles representations for identity, style, and composition, and (ii) a specialized masked-attention architecture that surgically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
