Diffusion-based Image Translation using Disentangled Style and Content Representation
Gihyun Kwon, Jong Chul Ye

TL;DR
This paper introduces a diffusion-based unsupervised image translation method that leverages disentangled style and content representations, improving content preservation and style transfer flexibility through novel loss functions and strategies.
Contribution
The paper proposes a new diffusion model approach using disentangled style and content features, including a semantic divergence loss and resampling, to enhance image translation quality.
Findings
Outperforms state-of-the-art models in text-guided translation
Maintains better content preservation during diffusion
Effective style transfer guided by CLIP and ViT features
Abstract
Diffusion-based image translation guided by semantic texts or a single target image has enabled flexible style transfer which is not limited to the specific domains. Unfortunately, due to the stochastic nature of diffusion models, it is often difficult to maintain the original content of the image during the reverse diffusion. To address this, here we present a novel diffusion-based unsupervised image translation method using disentangled style and content representation. Specifically, inspired by the splicing Vision Transformer, we extract intermediate keys of multihead self attention layer from ViT model and used them as the content preservation loss. Then, an image guided style transfer is performed by matching the [CLS] classification token from the denoised samples and target image, whereas additional CLIP loss is used for the text-driven style transfer. To further accelerate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Cancer-related molecular mechanisms research
MethodsAttention Is All You Need · Linear Layer · Label Smoothing · Multi-Head Attention · Adam · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Layer Normalization
