Reward-Forcing: Autoregressive Video Generation with Reward Feedback
Jingran Zhang, Ning Li, Yuanhao Ban, Andrew Bai, Justin Cui

TL;DR
This paper introduces a reward-guided autoregressive video generation method that simplifies training and achieves high-quality, temporally consistent videos, rivaling or surpassing bidirectional models without relying on teacher models.
Contribution
The paper presents a novel reward-based training approach for autoregressive video generation, eliminating the need for teacher models and improving scalability and performance.
Findings
Achieves a total score of 84.92 on VBench, close to state-of-the-art autoregressive methods.
Outperforms similar-sized bidirectional models by avoiding teacher model constraints.
Simplifies training while maintaining high visual fidelity and temporal consistency.
Abstract
While most prior work in video generation relies on bidirectional architectures, recent efforts have sought to adapt these models into autoregressive variants to support near real-time generation. However, such adaptations often depend heavily on teacher models, which can limit performance, particularly in the absence of a strong autoregressive teacher, resulting in output quality that typically lags behind their bidirectional counterparts. In this paper, we explore an alternative approach that uses reward signals to guide the generation process, enabling more efficient and scalable autoregressive generation. By using reward signals to guide the model, our method simplifies training while preserving high visual fidelity and temporal consistency. Through extensive experiments on standard benchmarks, we find that our approach performs comparably to existing autoregressive models and, in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
