Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

Songchun Zhang; Zeyue Xue; Siming Fu; Jie Huang; Xianghao Kong; Y Ma; Haoyang Huang; Nan Duan; and Anyi Rao

arXiv:2603.17051·cs.CV·March 19, 2026

Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

Songchun Zhang, Zeyue Xue, Siming Fu, Jie Huang, Xianghao Kong, Y Ma, Haoyang Huang, Nan Duan, and Anyi Rao

PDF

Open Access

TL;DR

Astrolabe introduces an efficient online reinforcement learning framework tailored for distilled autoregressive video models, improving alignment with human preferences without extensive re-distillation or high computational costs.

Contribution

The paper presents a novel forward-process RL formulation with negative-aware fine-tuning and a streaming training scheme for long videos, enabling scalable and effective alignment of AR video models.

Findings

01

Consistently improves generation quality across multiple models

02

Reduces computational overhead compared to existing methods

03

Enhances long-range coherence in video generation

Abstract

Distilled autoregressive (AR) video models enable efficient streaming generation but frequently misalign with human visual preferences. Existing reinforcement learning (RL) frameworks are not naturally suited to these architectures, typically requiring either expensive re-distillation or solver-coupled reverse-process optimization that introduces considerable memory and computational overhead. We present Astrolabe, an efficient online RL framework tailored for distilled AR models. To overcome existing bottlenecks, we introduce a forward-process RL formulation based on negative-aware fine-tuning. By contrasting positive and negative samples directly at inference endpoints, this approach establishes an implicit policy improvement direction without requiring reverse-process unrolling. To scale this alignment to long videos, we propose a streaming training scheme that generates sequences…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition