AR-CoPO: Align Autoregressive Video Generation with Contrastive Policy Optimization

Dailan He; Guanlin Feng; Xingtong Ge; Yi Zhang; Bingqi Ma; Guanglu Song; Yu Liu; Hongsheng Li

arXiv:2603.17461·cs.CV·March 19, 2026

AR-CoPO: Align Autoregressive Video Generation with Contrastive Policy Optimization

Dailan He, Guanlin Feng, Xingtong Ge, Yi Zhang, Bingqi Ma, Guanglu Song, Yu Liu, Hongsheng Li

PDF

Open Access

TL;DR

AR-CoPO is a novel framework that enhances autoregressive video generation by aligning it with contrastive policy optimization, improving quality and alignment with human preferences across domains.

Contribution

It introduces chunk-level alignment and a semi-on-policy training strategy to improve autoregressive video generation and alignment with human feedback.

Findings

01

Improves out-of-domain generalization.

02

Enhances in-domain human preference alignment.

03

Demonstrates genuine alignment over reward hacking.

Abstract

Streaming autoregressive (AR) video generators combined with few-step distillation achieve low-latency, high-quality synthesis, yet remain difficult to align via reinforcement learning from human feedback (RLHF). Existing SDE-based GRPO methods face challenges in this setting: few-step ODEs and consistency model samplers deviate from standard flow-matching ODEs, and their short, low-stochasticity trajectories are highly sensitive to initialization noise, rendering intermediate SDE exploration ineffective. We propose AR-CoPO (AutoRegressive Contrastive Policy Optimization), a framework that adapts the Neighbor GRPO contrastive perspective to streaming AR generation. AR-CoPO introduces chunk-level alignment via a forking mechanism that constructs neighborhood candidates at a randomly selected chunk, assigns sequence-level rewards, and performs localized GRPO updates. We further propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics