Accelerating Training of Autoregressive Video Generation Models via Local Optimization with Representation Continuity

Yucheng Zhou; Jianbing Shen

arXiv:2604.07402·cs.LG·April 10, 2026

Accelerating Training of Autoregressive Video Generation Models via Local Optimization with Representation Continuity

Yucheng Zhou, Jianbing Shen

PDF

TL;DR

This paper introduces a Local Optimization method with Representation Continuity to accelerate autoregressive video generation training, reducing costs by half while maintaining quality.

Contribution

It proposes novel techniques—Local Optimization and Representation Continuity—to improve training efficiency and consistency in autoregressive video models.

Findings

01

Halves training cost without quality loss.

02

Local Optimization reduces error propagation.

03

Representation Continuity enhances video consistency.

Abstract

Autoregressive models have shown superior performance and efficiency in image generation, but remain constrained by high computational costs and prolonged training times in video generation. In this study, we explore methods to accelerate training for autoregressive video generation models through empirical analyses. Our results reveal that while training on fewer video frames significantly reduces training time, it also exacerbates error accumulation and introduces inconsistencies in the generated videos. To address these issues, we propose a Local Optimization (Local Opt.) method, which optimizes tokens within localized windows while leveraging contextual information to reduce error propagation. Inspired by Lipschitz continuity, we propose a Representation Continuity (ReCo) strategy to improve the consistency of generated videos. ReCo utilizes continuity loss to constrain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.