Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
Shuhuai Ren, Shuming Ma, Xu Sun, Furu Wei

TL;DR
This paper introduces Next-Block Prediction, a semi-autoregressive framework for video generation that uses block-based bidirectional attention to enable parallel token prediction, resulting in faster inference and improved quality.
Contribution
The paper proposes a novel semi-autoregressive model for video generation that reduces inference steps and enhances quality through block-based bidirectional attention.
Findings
Achieves state-of-the-art FVD scores of 103.3 on UCF101 and 25.5 on K600.
Generates 8.89 frames per second at 128x128 resolution, 11x faster than previous models.
Scaling the model from 700M to 3B parameters improves FVD scores significantly.
Abstract
Next-Token Prediction (NTP) is a de facto approach for autoregressive (AR) video generation, but it suffers from suboptimal unidirectional dependencies and slow inference speed. In this work, we propose a semi-autoregressive (semi-AR) framework, called Next-Block Prediction (NBP), for video generation. By uniformly decomposing video content into equal-sized blocks (e.g., rows or frames), we shift the generation unit from individual tokens to blocks, allowing each token in the current block to simultaneously predict the corresponding token in the next block. Unlike traditional AR modeling, our framework employs bidirectional attention within each block, enabling tokens to capture more robust spatial dependencies. By predicting multiple tokens in parallel, NBP models significantly reduce the number of generation steps, leading to faster and more efficient inference. Our model achieves FVD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Generative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare
MethodsSoftmax · Attention Is All You Need
