KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

Ruicheng Zhang; Kaixi Cong; Jun Zhou; Zhizhou Zhong; Zunnan Xu; Shuiyang Mao; Wei Liu; Xiu Li

arXiv:2605.14278·cs.CV·May 15, 2026

KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

Ruicheng Zhang, Kaixi Cong, Jun Zhou, Zhizhou Zhong, Zunnan Xu, Shuiyang Mao, Wei Liu, Xiu Li

PDF

1 Repo

TL;DR

KVPO introduces an ODE-native framework for aligning autoregressive video generators with human preferences, leveraging semantic exploration and velocity-based policy modeling to improve visual and motion quality.

Contribution

It proposes a novel ODE-native policy optimization method with semantic exploration and velocity-based surrogate policy for better video alignment.

Findings

01

KVPO achieves consistent improvements in visual quality.

02

Enhances motion quality and text-video alignment.

03

Effective on both short and long video generation tasks.

Abstract

Aligning streaming autoregressive (AR) video generators with human preferences is challenging. Existing reinforcement learning methods predominantly rely on noise-based exploration and SDE-based surrogate policies that are mismatched to the deterministic ODE dynamics of distilled AR models, and tend to perturb low-level appearance rather than the high-level semantic storyline progression critical for long-horizon coherence. To address these limitations, we present KVPO, an ODE-native online Group Relative Policy Optimization (GRPO) framework for aligning streaming video generators. For diversity exploration, KVPO introduces a causal-semantic exploration paradigm that relocates the source of variation from stochastic noise to the historical KV cache. By stochastically routing historical KV entries, it constructs semantically diverse generation branches that remain strictly on the data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

richard-zhang-ai/KVPO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.