Enhancing Conditional Image Generation with Explainable Latent Space Manipulation
Kshitij Pathania

TL;DR
This paper introduces a novel diffusion-based image synthesis method that uses explainable latent space manipulation and gradient-based attention to improve fidelity to reference images and conditional prompts, achieving superior results in quality and alignment.
Contribution
The paper presents a new approach combining diffusion models with explainable latent space manipulation and gradient-based attention, enhancing fidelity and controllability in image generation.
Findings
Achieves lower FID scores indicating better image fidelity.
Demonstrates high CLIP scores for text-image alignment.
Outperforms baseline models in quality metrics.
Abstract
In the realm of image synthesis, achieving fidelity to a reference image while adhering to conditional prompts remains a significant challenge. This paper proposes a novel approach that integrates a diffusion model with latent space manipulation and gradient-based selective attention mechanisms to address this issue. Leveraging Grad-SAM (Gradient-based Selective Attention Manipulation), we analyze the cross attention maps of the cross attention layers and gradients for the denoised latent vector, deriving importance scores of elements of denoised latent vector related to the subject of interest. Using this information, we create masks at specific timesteps during denoising to preserve subjects while seamlessly integrating the reference image features. This approach ensures the faithful formation of subjects based on conditional prompts, while concurrently refining the background for a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Medical Image Segmentation Techniques · Image Retrieval and Classification Techniques
MethodsSoftmax · Attention Is All You Need · Channel-wise Cross Attention · Diffusion · Contrastive Language-Image Pre-training
