AIComposer: Any Style and Content Image Composition via Feature Integration
Haowen Li, Zhenfeng Fan, Zhang Wen, Zhengzhou Zhu, Yunjin Li

TL;DR
AIComposer introduces a novel cross-domain image composition method that eliminates the need for text prompts, effectively preserves content, and achieves superior stylization and composition quality using diffusion models and feature integration.
Contribution
It is the first to enable text-prompt-free cross-domain image composition with a simple MLP and local cross-attention, improving robustness and style transfer without additional training.
Findings
Outperforms state-of-the-art in LPIPS and CSD metrics.
Preserves foreground content effectively during stylization.
Demonstrates robustness across diverse styles and contents.
Abstract
Image composition has advanced significantly with large-scale pre-trained T2I diffusion models. Despite progress in same-domain composition, cross-domain composition remains under-explored. The main challenges are the stochastic nature of diffusion models and the style gap between input images, leading to failures and artifacts. Additionally, heavy reliance on text prompts limits practical applications. This paper presents the first cross-domain image composition method that does not require text prompts, allowing natural stylization and seamless compositions. Our method is efficient and robust, preserving the diffusion prior, as it involves minor steps for backward inversion and forward denoising without training the diffuser. Our method also uses a simple multilayer perceptron network to integrate CLIP features from foreground and background, manipulating diffusion with a local…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
