TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization
Kien T. Pham, Jingye Chen, Qifeng Chen

TL;DR
TALE is a training-free framework that uses latent space manipulation and energy-guided optimization to improve cross-domain image composition with diffusion models, avoiding costly retraining and enhancing compositional fidelity.
Contribution
It introduces a novel latent space approach with adaptive manipulation and energy-guided optimization for training-free image composition, outperforming existing attention-based methods.
Findings
Outperforms prior methods in cross-domain image composition tasks.
Effectively preserves object identity and style adaptation.
Achieves state-of-the-art results across diverse domains.
Abstract
We present TALE, a novel training-free framework harnessing the generative capabilities of text-to-image diffusion models to address the cross-domain image composition task that focuses on flawlessly incorporating user-specified objects into a designated visual contexts regardless of domain disparity. Previous methods often involve either training auxiliary networks or finetuning diffusion models on customized datasets, which are expensive and may undermine the robust textual and visual priors of pre-trained diffusion models. Some recent works attempt to break the barrier by proposing training-free workarounds that rely on manipulating attention maps to tame the denoising process implicitly. However, composing via attention maps does not necessarily yield desired compositional outcomes. These approaches could only retain some semantic information and usually fall short in preserving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need · Diffusion
