VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation

Shikun Sun; Liao Qu; Huichao Zhang; Yiheng Liu; Yangyang Song; Xian Li; Xu Wang; Yi Jiang; Daniel K. Du; Xinglong Wu; Jia Jia

arXiv:2601.02256·cs.CV·January 6, 2026

VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation

Shikun Sun, Liao Qu, Huichao Zhang, Yiheng Liu, Yangyang Song, Xian Li, Xu Wang, Yi Jiang, Daniel K. Du, Xinglong Wu, Jia Jia

PDF

Open Access

TL;DR

This paper introduces a novel reinforcement learning framework for visual autoregressive models that effectively manages asynchronous policy conflicts, leading to improved sample quality and alignment.

Contribution

It proposes an enhanced Group Relative Policy Optimization method with three components to address asynchronous conflicts in VAR models during RL training.

Findings

01

Significant improvements in sample quality over baseline

02

Enhanced objective alignment in VAR models

03

Robust and effective optimization demonstrated

Abstract

Visual generation is dominated by three paradigms: AutoRegressive (AR), diffusion, and Visual AutoRegressive (VAR) models. Unlike AR and diffusion, VARs operate on heterogeneous input structures across their generation steps, which creates severe asynchronous policy conflicts. This issue becomes particularly acute in reinforcement learning (RL) scenarios, leading to unstable training and suboptimal alignment. To resolve this, we propose a novel framework to enhance Group Relative Policy Optimization (GRPO) by explicitly managing these conflicts. Our method integrates three synergistic components: 1) a stabilizing intermediate reward to guide early-stage generation; 2) a dynamic time-step reweighting scheme for precise credit assignment; and 3) a novel mask propagation algorithm, derived from principles of Reward Feedback Learning (ReFL), designed to isolate optimization effects both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics