Layout Control and Semantic Guidance with Attention Loss Backward for T2I Diffusion Model
Guandong Li

TL;DR
This paper introduces a train-free attention loss backward method for controllable image generation with diffusion models, effectively managing layout and semantic attributes without training or fine-tuning.
Contribution
It presents a novel, training-free approach that controls cross attention maps to improve attribute accuracy and layout adherence in diffusion-based image generation.
Findings
Addresses attribute mismatch and prompt-following issues.
Achieves effective layout control without training.
Demonstrates practical application in production environments.
Abstract
Controllable image generation has always been one of the core demands in image generation, aiming to create images that are both creative and logical while satisfying additional specified conditions. In the post-AIGC era, controllable generation relies on diffusion models and is accomplished by maintaining certain components or introducing inference interferences. This paper addresses key challenges in controllable generation: 1. mismatched object attributes during generation and poor prompt-following effects; 2. inadequate completion of controllable layouts. We propose a train-free method based on attention loss backward, cleverly controlling the cross attention map. By utilizing external conditions such as prompts that can reasonably map onto the attention map, we can control image generation without any training or fine-tuning. This method addresses issues like attribute mismatch and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Parallel Computing and Optimization Techniques · Manufacturing Process and Optimization
MethodsSoftmax · Attention Is All You Need · Diffusion
