DiffExp: Efficient Exploration in Reward Fine-tuning for Text-to-Image   Diffusion Models

Daewon Chae; June Suk Choi; Jinkyu Kim; Kimin Lee

arXiv:2502.14070·cs.CV·February 21, 2025

DiffExp: Efficient Exploration in Reward Fine-tuning for Text-to-Image Diffusion Models

Daewon Chae, June Suk Choi, Jinkyu Kim, Kimin Lee

PDF

Open Access 1 Video

TL;DR

DiffExp introduces a novel exploration strategy for reward fine-tuning in text-to-image diffusion models, improving sample diversity and efficiency by dynamically adjusting guidance and phrase weighting, leading to better performance.

Contribution

The paper proposes DiffExp, a simple exploration method that enhances reward fine-tuning by dynamically adjusting guidance scale and phrase weighting, improving sample diversity and efficiency.

Findings

01

Enhanced exploration improves sample efficiency in reward fine-tuning.

02

DiffExp outperforms existing methods like DDPO and AlignProp.

03

Significant gains in model performance and convergence speed.

Abstract

Fine-tuning text-to-image diffusion models to maximize rewards has proven effective for enhancing model performance. However, reward fine-tuning methods often suffer from slow convergence due to online sample generation. Therefore, obtaining diverse samples with strong reward signals is crucial for improving sample efficiency and overall performance. In this work, we introduce DiffExp, a simple yet effective exploration strategy for reward fine-tuning of text-to-image models. Our approach employs two key strategies: (a) dynamically adjusting the scale of classifier-free guidance to enhance sample diversity, and (b) randomly weighting phrases of the text prompt to exploit high-quality reward signals. We demonstrate that these strategies significantly enhance exploration during online sample generation, improving the sample efficiency of recent reward fine-tuning methods, such as DDPO and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DiffExp: Efficient Exploration in Reward Fine-tuning for Text-to-Image Diffusion Models· underline

Taxonomy

TopicsAI in cancer detection · Advanced Data Compression Techniques

MethodsDiffusion