Highly Efficient Test-Time Scaling for T2I Diffusion Models with Text Embedding Perturbation
Hang Xu, Linjiang Huang, Feng Zhao

TL;DR
This paper introduces a novel test-time scaling method for text-to-image diffusion models that uses text embedding perturbation to improve diversity and quality by complementing existing noise strategies, with minimal extra computation.
Contribution
The work proposes a new randomness format—text embedding perturbation—for TTS in T2I diffusion models, enhancing generative diversity and quality through frequency-guided perturbation strategies.
Findings
Frequency analysis shows complementary behavior of spatial noise and text embedding perturbation.
The method improves benchmark results with minimal additional computation.
Step-based and frequency-guided perturbation strategies are effective in practice.
Abstract
Test-time scaling (TTS) aims to achieve better results by increasing random sampling and evaluating samples based on rules and metrics. However, in text-to-image(T2I) diffusion models, most related works focus on search strategies and reward models, yet the impact of the stochastic characteristic of noise in T2I diffusion models on the method's performance remains unexplored. In this work, we analyze the effects of randomness in T2I diffusion models and explore a new format of randomness for TTS: text embedding perturbation, which couples with existing randomness like SDE-injected noise to enhance generative diversity and quality. We start with a frequency-domain analysis of these formats of randomness and their impact on generation, and find that these two randomness exhibit complementary behavior in the frequency domain: spatial noise favors low-frequency components (early steps),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neuroimaging Techniques and Applications · Domain Adaptation and Few-Shot Learning
