Diffusion-based Image Translation using Disentangled Style and Content   Representation

Gihyun Kwon; Jong Chul Ye

arXiv:2209.15264·cs.CV·February 2, 2023·46 cites

Diffusion-based Image Translation using Disentangled Style and Content Representation

Gihyun Kwon, Jong Chul Ye

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

This paper introduces a diffusion-based unsupervised image translation method that leverages disentangled style and content representations, improving content preservation and style transfer flexibility through novel loss functions and strategies.

Contribution

The paper proposes a new diffusion model approach using disentangled style and content features, including a semantic divergence loss and resampling, to enhance image translation quality.

Findings

01

Outperforms state-of-the-art models in text-guided translation

02

Maintains better content preservation during diffusion

03

Effective style transfer guided by CLIP and ViT features

Abstract

Diffusion-based image translation guided by semantic texts or a single target image has enabled flexible style transfer which is not limited to the specific domains. Unfortunately, due to the stochastic nature of diffusion models, it is often difficult to maintain the original content of the image during the reverse diffusion. To address this, here we present a novel diffusion-based unsupervised image translation method using disentangled style and content representation. Specifically, inspired by the splicing Vision Transformer, we extract intermediate keys of multihead self attention layer from ViT model and used them as the content preservation loss. Then, an image guided style transfer is performed by matching the [CLS] classification token from the denoised samples and target image, whereas additional CLIP loss is used for the text-driven style transfer. To further accelerate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anon294384/diffuseit
pytorchOfficial

Models

🤗
BiliSakura/DiffuseIT-ckpt
model

Videos

Diffusion-based Image Translation using disentangled style and content representation· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Cancer-related molecular mechanisms research

MethodsAttention Is All You Need · Linear Layer · Label Smoothing · Multi-Head Attention · Adam · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Layer Normalization