Dynamic Frequency Modulation for Controllable Text-driven Image Generation
Tiandong Shi, Ling Zhao, Ji Qi, Jiayi Ma, Chengli Peng

TL;DR
This paper introduces a frequency-based, training-free modulation technique for text-driven image generation that enhances semantic editing control while maintaining structural consistency, outperforming existing methods.
Contribution
It proposes a novel frequency modulation approach that directly manipulates latent variables, avoiding empirical feature selection and improving semantic editing stability.
Findings
Outperforms state-of-the-art methods in balancing structure preservation and semantic editing.
Effectively manipulates frequency spectrum to control image generation details.
Demonstrates robustness across various image editing scenarios.
Abstract
The success of text-guided diffusion models has established a new image generation paradigm driven by the iterative refinement of text prompts. However, modifying the original text prompt to achieve the expected semantic adjustments often results in unintended global structure changes that disrupt user intent. Existing methods rely on empirical feature map selection for intervention, whose performance heavily depends on appropriate selection, leading to suboptimal stability. This paper tries to solve the aforementioned problem from a frequency perspective and analyzes the impact of the frequency spectrum of noisy latent variables on the hierarchical emergence of the structure framework and fine-grained textures during the generation process. We find that lower-frequency components are primarily responsible for establishing the structure framework in the early generation stage. Their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Image Enhancement Techniques
