Reward Score Matching: Unifying Reward-based Fine-tuning for Flow and Diffusion Models
Jeongjae Lee, Jinho Chang, Jeongsol Kim, Jong Chul Ye

TL;DR
This paper introduces reward score matching (RSM), a unified framework for reward-based fine-tuning of flow and diffusion models, clarifying existing methods and enabling more efficient redesigns.
Contribution
The paper unifies various reward-based fine-tuning methods under a common framework, revealing core components and enabling simpler, more effective approaches.
Findings
RSM clarifies the bias-variance-tradeoffs in reward fine-tuning methods.
Redesigns based on RSM improve efficiency and simplicity in reward alignment tasks.
The unified framework enhances interpretability and actionability of reward-based fine-tuning.
Abstract
Reward-based fine-tuning steers a pretrained diffusion or flow-based generative model toward higher-reward samples while remaining close to the pretrained model. Although existing methods are derived from different perspectives, we show that many can be written under a common framework, which we call reward score matching (RSM). Under this view, alignment becomes score matching against a value-guided target, and the main differences across methods reduce to the construction of the value-guidance estimator and the effective optimization strength across timesteps. This unification clarifies the bias-variance-compute tradeoffs of existing designs, and distinguishes core optimization components from auxiliary mechanisms that add complexity without clear benefit. Guided by this perspective, we develop simpler, more efficient redesigns across representative differentiable and black-box reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
