Anchoring Values in Temporal and Group Dimensions for Flow Matching Model Alignment
Yawen Shao, Jie Xiao, Kai Zhu, Yu Liu, Wei Zhai, Yang Cao, Zheng-Jun Zha

TL;DR
This paper introduces VGPO, a novel framework that improves flow matching model alignment in image generation by anchoring values across time and group dimensions, addressing reward sparsity and stagnation issues.
Contribution
VGPO redefines value estimation in flow matching models by incorporating dense, process-aware values and absolute rewards, enhancing alignment and stability in image generation.
Findings
VGPO achieves state-of-the-art image quality on benchmarks.
VGPO improves task-specific accuracy and reduces reward hacking.
VGPO maintains stable optimization signals during training.
Abstract
Group Relative Policy Optimization (GRPO) has proven highly effective in enhancing the alignment capabilities of Large Language Models (LLMs). However, current adaptations of GRPO for the flow matching-based image generation neglect a foundational conflict between its core principles and the distinct dynamics of the visual synthesis process. This mismatch leads to two key limitations: (i) Uniformly applying a sparse terminal reward across all timesteps impairs temporal credit assignment, ignoring the differing criticality of generation phases from early structure formation to late-stage tuning. (ii) Exclusive reliance on relative, intra-group rewards causes the optimization signal to fade as training converges, leading to the optimization stagnation when reward diversity is entirely depleted. To address these limitations, we propose Value-Anchored Group Policy Optimization (VGPO), a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
