Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation
Rui Li, Ke Hao, Yuanzhi Liang, Haibin Huang, Chi Zhang, Yun Gu, XueLong Li

TL;DR
This paper introduces OTCA, a structured framework that improves reward credit assignment in reinforcement learning for visual generation, leading to better image and video quality.
Contribution
It proposes a novel objective-aware credit assignment method that decomposes and adaptively weights multiple reward signals during diffusion-based generative model training.
Findings
OTCA enhances image and video generation quality.
It effectively decomposes temporal and objective-level credit.
Experiments demonstrate consistent improvements across metrics.
Abstract
Reinforcement learning, particularly Group Relative Policy Optimization (GRPO), has emerged as an effective framework for post-training visual generative models with human preference signals. However, its effectiveness is fundamentally limited by coarse reward credit assignment. In modern visual generation, multiple reward models are often used to capture heterogeneous objectives, such as visual quality, motion consistency, and text alignment. Existing GRPO pipelines typically collapse these rewards into a single static scalar and propagate it uniformly across the entire diffusion trajectory. This design ignores the stage-specific roles of different denoising steps and produces mistimed or incompatible optimization signals. To address this issue, we propose Objective-aware Trajectory Credit Assignment (OTCA), a structured framework for fine-grained GRPO training. OTCA consists of two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
