Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

Tao Zhang; Cheng Da; Kun Ding; Huan Yang; Kun Jin; Yan Li; Tingting Gao; Di Zhang; Shiming Xiang; Chunhong Pan

arXiv:2502.01051·cs.CV·October 3, 2025

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

Tao Zhang, Cheng Da, Kun Ding, Huan Yang, Kun Jin, Yan Li, Tingting Gao, Di Zhang, Shiming Xiang, Chunhong Pan

PDF

Open Access 1 Repo 2 Models

TL;DR

This paper introduces Latent Reward Model and Latent Preference Optimization, leveraging diffusion models for step-level preference alignment in noisy latent space, resulting in faster training and improved preference alignment.

Contribution

It proposes a novel latent space reward model and preference optimization method that utilize diffusion models directly, enhancing efficiency and effectiveness over pixel-based approaches.

Findings

01

Significant improvement in preference alignment across multiple criteria.

02

Achieved 2.5-28x faster training speeds.

03

Effective in handling noisy images at various timesteps.

Abstract

Preference optimization for diffusion models aims to align them with human preferences for images. Previous methods typically use Vision-Language Models (VLMs) as pixel-level reward models to approximate human preferences. However, when used for step-level preference optimization, these models face challenges in handling noisy images of different timesteps and require complex transformations into pixel space. In this work, we show that pre-trained diffusion models are naturally suited for step-level reward modeling in the noisy latent space, as they are explicitly designed to process latent images at various noise levels. Accordingly, we propose the Latent Reward Model (LRM), which repurposes components of the diffusion model to predict preferences of latent images at arbitrary timesteps. Building on LRM, we introduce Latent Preference Optimization (LPO), a step-level preference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kwai-kolors/lpo
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProduct Development and Customization · Advanced Multi-Objective Optimization Algorithms

MethodsALIGN · Diffusion