Collage Diffusion
Vishnu Sarukkai, Linden Li, Arden Ma, Christopher R\'e, Kayvon, Fatahalian

TL;DR
Collage Diffusion introduces a method for precise, layer-based control over diffusion image generation, enabling users to specify object placement and attributes, and iteratively edit images while maintaining object fidelity.
Contribution
It presents a novel layer-based approach that allows detailed control and editing of generated images, improving object placement and attribute preservation compared to prior methods.
Findings
Better object placement accuracy in generated images
Enhanced preservation of key visual attributes
Enables iterative object editing in generated images
Abstract
We seek to give users precise control over diffusion-based image generation by modeling complex scenes as sequences of layers, which define the desired spatial arrangement and visual attributes of objects in the scene. Collage Diffusion harmonizes the input layers to make objects fit together -- the key challenge involves minimizing changes in the positions and key visual attributes of the input layers while allowing other attributes to change in the harmonization process. We ensure that objects are generated in the correct locations by modifying text-image cross-attention with the layers' alpha masks. We preserve key visual attributes of input layers by learning specialized text representations per layer and by extending ControlNet to operate on layers. Layer input allows users to control the extent of image harmonization on a per-object basis, and users can even iteratively edit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Collage Diffusion· youtube
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Computer Graphics and Visualization Techniques
MethodsDiffusion
