TALE: Training-free Cross-domain Image Composition via Adaptive Latent   Manipulation and Energy-guided Optimization

Kien T. Pham; Jingye Chen; Qifeng Chen

arXiv:2408.03637·cs.CV·August 8, 2024

TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization

Kien T. Pham, Jingye Chen, Qifeng Chen

PDF

TL;DR

TALE is a training-free framework that uses latent space manipulation and energy-guided optimization to improve cross-domain image composition with diffusion models, avoiding costly retraining and enhancing compositional fidelity.

Contribution

It introduces a novel latent space approach with adaptive manipulation and energy-guided optimization for training-free image composition, outperforming existing attention-based methods.

Findings

01

Outperforms prior methods in cross-domain image composition tasks.

02

Effectively preserves object identity and style adaptation.

03

Achieves state-of-the-art results across diverse domains.

Abstract

We present TALE, a novel training-free framework harnessing the generative capabilities of text-to-image diffusion models to address the cross-domain image composition task that focuses on flawlessly incorporating user-specified objects into a designated visual contexts regardless of domain disparity. Previous methods often involve either training auxiliary networks or finetuning diffusion models on customized datasets, which are expensive and may undermine the robust textual and visual priors of pre-trained diffusion models. Some recent works attempt to break the barrier by proposing training-free workarounds that rely on manipulating attention maps to tame the denoising process implicitly. However, composing via attention maps does not necessarily yield desired compositional outcomes. These approaches could only retain some semantic information and usually fall short in preserving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need · Diffusion