Enhancing Conditional Image Generation with Explainable Latent Space   Manipulation

Kshitij Pathania

arXiv:2408.16232·cs.CV·August 30, 2024

Enhancing Conditional Image Generation with Explainable Latent Space Manipulation

Kshitij Pathania

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel diffusion-based image synthesis method that uses explainable latent space manipulation and gradient-based attention to improve fidelity to reference images and conditional prompts, achieving superior results in quality and alignment.

Contribution

The paper presents a new approach combining diffusion models with explainable latent space manipulation and gradient-based attention, enhancing fidelity and controllability in image generation.

Findings

01

Achieves lower FID scores indicating better image fidelity.

02

Demonstrates high CLIP scores for text-image alignment.

03

Outperforms baseline models in quality metrics.

Abstract

In the realm of image synthesis, achieving fidelity to a reference image while adhering to conditional prompts remains a significant challenge. This paper proposes a novel approach that integrates a diffusion model with latent space manipulation and gradient-based selective attention mechanisms to address this issue. Leveraging Grad-SAM (Gradient-based Selective Attention Manipulation), we analyze the cross attention maps of the cross attention layers and gradients for the denoised latent vector, deriving importance scores of elements of denoised latent vector related to the subject of interest. Using this information, we create masks at specific timesteps during denoising to preserve subjects while seamlessly integrating the reference image features. This approach ensures the faithful formation of subjects based on conditional prompts, while concurrently refining the background for a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kshitij79/CS-7476-Improvements-in-Diffusion-Model
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Medical Image Segmentation Techniques · Image Retrieval and Classification Techniques

MethodsSoftmax · Attention Is All You Need · Channel-wise Cross Attention · Diffusion · Contrastive Language-Image Pre-training