Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

Xiaoxuan He; Siming Fu; Zeyue Xue; Weijie Wang; Ruizhe He; Yuming Li; Dacheng Yin; Shuai Dong; Haoyang Huang; Hongfa Wang; Nan Duan; Bohan Zhuang

arXiv:2605.15980·cs.CV·May 18, 2026

Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

Xiaoxuan He, Siming Fu, Zeyue Xue, Weijie Wang, Ruizhe He, Yuming Li, Dacheng Yin, Shuai Dong, Haoyang Huang, Hongfa Wang, Nan Duan, Bohan Zhuang

PDF

1 Repo

TL;DR

Flash-GRPO introduces a single-step training method for video diffusion models that significantly improves efficiency and stability, achieving state-of-the-art alignment with human preferences at reduced computational costs.

Contribution

It proposes Flash-GRPO, a novel one-step policy optimization framework that overcomes stability issues and enhances training efficiency for large-scale video diffusion models.

Findings

01

Outperforms full trajectory training in alignment quality.

02

Reduces training time substantially while maintaining stability.

03

Validates effectiveness on models up to 14B parameters.

Abstract

Group Relative Policy Optimization has emerged as essential for aligning video diffusion models with human preferences, but faces a critical computational bottleneck: training a 14B parametered model typically demands hundreds of GPU days per experiment. Existing efficiency methods reduce costs through sliding window subsampling training timesteps, but fundamentally compromise optimization, exhibiting severe instability and failing to reach full trajectory performance. We present Flash-GRPO, a single-step training framework that outperforms full trajectory training in alignment quality under low computational budgets while substantially improving training efficiency. Flash-GRPO addresses two critical challenges: iso-temporal grouping eliminates timestep-confounded variance by enforcing prompt-wise temporal consistency, decoupling policy performance from timestep difficulty; temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shredded-pork/Flash-GRPO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.