HiAR: Efficient Autoregressive Long Video Generation via Hierarchical Denoising
Kai Zou, Dian Zheng, Hongbo Liu, Tiankai Hang, Bin Liu, Nenghai Yu

TL;DR
HiAR introduces a hierarchical denoising approach for autoregressive long video generation, enabling efficient, high-quality, and temporally consistent video synthesis by conditioning at the same noise level across blocks.
Contribution
The paper proposes a novel hierarchical denoising framework that reverses the generation order and conditions on same noise level contexts, improving efficiency and temporal coherence in long video generation.
Findings
Achieves a 1.8x speedup with pipelined parallel inference.
Outperforms existing methods on VBench with better temporal consistency.
Introduces a bidirectional attention regularizer to maintain motion diversity.
Abstract
Autoregressive (AR) diffusion offers a promising framework for generating videos of theoretically infinite length. However, a major challenge is maintaining temporal continuity while preventing the progressive quality degradation caused by error accumulation. To ensure continuity, existing methods typically condition on highly denoised contexts; yet, this practice propagates prediction errors with high certainty, thereby exacerbating degradation. In this paper, we argue that a highly clean context is unnecessary. Drawing inspiration from bidirectional diffusion models, which denoise frames at a shared noise level while maintaining coherence, we propose that conditioning on context at the same noise level as the current block provides sufficient signal for temporal consistency while effectively mitigating error propagation. Building on this insight, we propose HiAR, a hierarchical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image and Video Quality Assessment · Image Enhancement Techniques
