DiffExp: Efficient Exploration in Reward Fine-tuning for Text-to-Image Diffusion Models
Daewon Chae, June Suk Choi, Jinkyu Kim, Kimin Lee

TL;DR
DiffExp introduces a novel exploration strategy for reward fine-tuning in text-to-image diffusion models, improving sample diversity and efficiency by dynamically adjusting guidance and phrase weighting, leading to better performance.
Contribution
The paper proposes DiffExp, a simple exploration method that enhances reward fine-tuning by dynamically adjusting guidance scale and phrase weighting, improving sample diversity and efficiency.
Findings
Enhanced exploration improves sample efficiency in reward fine-tuning.
DiffExp outperforms existing methods like DDPO and AlignProp.
Significant gains in model performance and convergence speed.
Abstract
Fine-tuning text-to-image diffusion models to maximize rewards has proven effective for enhancing model performance. However, reward fine-tuning methods often suffer from slow convergence due to online sample generation. Therefore, obtaining diverse samples with strong reward signals is crucial for improving sample efficiency and overall performance. In this work, we introduce DiffExp, a simple yet effective exploration strategy for reward fine-tuning of text-to-image models. Our approach employs two key strategies: (a) dynamically adjusting the scale of classifier-free guidance to enhance sample diversity, and (b) randomly weighting phrases of the text prompt to exploit high-quality reward signals. We demonstrate that these strategies significantly enhance exploration during online sample generation, improving the sample efficiency of recent reward fine-tuning methods, such as DDPO and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAI in cancer detection · Advanced Data Compression Techniques
MethodsDiffusion
