Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models
Songchun Zhang, Zeyue Xue, Siming Fu, Jie Huang, Xianghao Kong, Y Ma, Haoyang Huang, Nan Duan, and Anyi Rao

TL;DR
Astrolabe introduces an efficient online reinforcement learning framework tailored for distilled autoregressive video models, improving alignment with human preferences without extensive re-distillation or high computational costs.
Contribution
The paper presents a novel forward-process RL formulation with negative-aware fine-tuning and a streaming training scheme for long videos, enabling scalable and effective alignment of AR video models.
Findings
Consistently improves generation quality across multiple models
Reduces computational overhead compared to existing methods
Enhances long-range coherence in video generation
Abstract
Distilled autoregressive (AR) video models enable efficient streaming generation but frequently misalign with human visual preferences. Existing reinforcement learning (RL) frameworks are not naturally suited to these architectures, typically requiring either expensive re-distillation or solver-coupled reverse-process optimization that introduces considerable memory and computational overhead. We present Astrolabe, an efficient online RL framework tailored for distilled AR models. To overcome existing bottlenecks, we introduce a forward-process RL formulation based on negative-aware fine-tuning. By contrasting positive and negative samples directly at inference endpoints, this approach establishes an implicit policy improvement direction without requiring reverse-process unrolling. To scale this alignment to long videos, we propose a streaming training scheme that generates sequences…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
