Know Your Step: Faster and Better Alignment for Flow Matching Models via Step-aware Advantages
Zhixiong Yue, Zixuan Ni, Feiyang Ye, Jinshan Zhang, Sheng Shen, Zhenpeng Mi

TL;DR
This paper introduces TAFS GRPO, a novel training framework that enhances few step flow matching models for text-to-image generation, achieving better alignment with human preferences through adaptive noise injection and step-aware rewards.
Contribution
The paper proposes TAFS GRPO, a new method that improves few step flow matching models by incorporating adaptive temporal noise and step-aware advantage mechanisms, avoiding differentiable reward requirements.
Findings
Significantly improves few step text-to-image generation quality.
Achieves better alignment with human preferences.
Demonstrates strong performance in experimental evaluations.
Abstract
Recent advances in flow matching models, particularly with reinforcement learning (RL), have significantly enhanced human preference alignment in few step text to image generators. However, existing RL based approaches for flow matching models typically rely on numerous denoising steps, while suffering from sparse and imprecise reward signals that often lead to suboptimal alignment. To address these limitations, we propose Temperature Annealed Few step Sampling with Group Relative Policy Optimization (TAFS GRPO), a novel framework for training flow matching text to image models into efficient few step generators well aligned with human preferences. Our method iteratively injects adaptive temporal noise onto the results of one step samples. By repeatedly annealing the model's sampled outputs, it introduces stochasticity into the sampling process while preserving the semantic integrity of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Artificial Intelligence in Games
