Accelerating Training of Autoregressive Video Generation Models via Local Optimization with Representation Continuity
Yucheng Zhou, Jianbing Shen

TL;DR
This paper introduces a Local Optimization method with Representation Continuity to accelerate autoregressive video generation training, reducing costs by half while maintaining quality.
Contribution
It proposes novel techniques—Local Optimization and Representation Continuity—to improve training efficiency and consistency in autoregressive video models.
Findings
Halves training cost without quality loss.
Local Optimization reduces error propagation.
Representation Continuity enhances video consistency.
Abstract
Autoregressive models have shown superior performance and efficiency in image generation, but remain constrained by high computational costs and prolonged training times in video generation. In this study, we explore methods to accelerate training for autoregressive video generation models through empirical analyses. Our results reveal that while training on fewer video frames significantly reduces training time, it also exacerbates error accumulation and introduces inconsistencies in the generated videos. To address these issues, we propose a Local Optimization (Local Opt.) method, which optimizes tokens within localized windows while leveraging contextual information to reduce error propagation. Inspired by Lipschitz continuity, we propose a Representation Continuity (ReCo) strategy to improve the consistency of generated videos. ReCo utilizes continuity loss to constrain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
