Frequency-Controlled Diffusion Model for Versatile Text-Guided   Image-to-Image Translation

Xiang Gao; Zhengbo Xu; Junhan Zhao; Jiaying Liu

arXiv:2407.03006·cs.CV·March 28, 2025

Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation

Xiang Gao, Zhengbo Xu, Junhan Zhao, Jiaying Liu

PDF

2 Repos

TL;DR

The paper introduces FCDiffusion, a frequency-controlled diffusion framework for versatile text-guided image-to-image translation, leveraging DCT-based filtering to control various image attributes in a unified manner.

Contribution

It proposes a novel frequency-domain filtering approach within diffusion models for flexible, multi-attribute text-guided image translation, enabling diverse applications with a single framework.

Findings

01

Effective control over style, structure, and layout in image translation.

02

Outperforms existing methods in qualitative and quantitative evaluations.

03

Versatile application across multiple image translation tasks.

Abstract

Recently, large-scale text-to-image (T2I) diffusion models have emerged as a powerful tool for image-to-image translation (I2I), allowing open-domain image translation via user-provided text prompts. This paper proposes frequency-controlled diffusion model (FCDiffusion), an end-to-end diffusion-based framework that contributes a novel solution to text-guided I2I from a frequency-domain perspective. At the heart of our framework is a feature-space frequency-domain filtering module based on Discrete Cosine Transform, which filters the latent features of the source image in the DCT domain, yielding filtered image features bearing different DCT spectral bands as different control signals to the pre-trained Latent Diffusion Model. We reveal that control signals of different DCT spectral bands bridge the source image and the T2I generated image in different correlations (e.g., style,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLatent Diffusion Model · Diffusion · Discrete Cosine Transform