TL;DR
The paper introduces FCDiffusion, a frequency-controlled diffusion framework for versatile text-guided image-to-image translation, leveraging DCT-based filtering to control various image attributes in a unified manner.
Contribution
It proposes a novel frequency-domain filtering approach within diffusion models for flexible, multi-attribute text-guided image translation, enabling diverse applications with a single framework.
Findings
Effective control over style, structure, and layout in image translation.
Outperforms existing methods in qualitative and quantitative evaluations.
Versatile application across multiple image translation tasks.
Abstract
Recently, large-scale text-to-image (T2I) diffusion models have emerged as a powerful tool for image-to-image translation (I2I), allowing open-domain image translation via user-provided text prompts. This paper proposes frequency-controlled diffusion model (FCDiffusion), an end-to-end diffusion-based framework that contributes a novel solution to text-guided I2I from a frequency-domain perspective. At the heart of our framework is a feature-space frequency-domain filtering module based on Discrete Cosine Transform, which filters the latent features of the source image in the DCT domain, yielding filtered image features bearing different DCT spectral bands as different control signals to the pre-trained Latent Diffusion Model. We reveal that control signals of different DCT spectral bands bridge the source image and the T2I generated image in different correlations (e.g., style,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLatent Diffusion Model · Diffusion · Discrete Cosine Transform
