TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction

Yukuo Ma; Cong Liu; Junke Wang; Junqi Liu; Haibin Huang; Zuxuan Wu; Chi Zhang; Xuelong Li

arXiv:2511.12578·cs.CV·December 3, 2025

TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction

Yukuo Ma, Cong Liu, Junke Wang, Junqi Liu, Haibin Huang, Zuxuan Wu, Chi Zhang, Xuelong Li

PDF

Open Access 1 Models

TL;DR

TempoMaster introduces a novel method for long video generation by predicting and refining frame rates, achieving high-quality, temporally coherent videos efficiently through a multi-stage process.

Contribution

It proposes a new framework that generates long videos by progressively increasing frame rates, combining bidirectional attention and autoregression for improved coherence and efficiency.

Findings

01

Sets new state-of-the-art in long video generation

02

Achieves superior visual and temporal quality

03

Enables efficient parallel synthesis of videos

Abstract

We present TempoMaster, a novel framework that formulates long video generation as next-frame-rate prediction. Specifically, we first generate a low-frame-rate clip that serves as a coarse blueprint of the entire video sequence, and then progressively increase the frame rate to refine visual details and motion continuity. During generation, TempoMaster employs bidirectional attention within each frame-rate level while performing autoregression across frame rates, thus achieving long-range temporal coherence while enabling efficient and parallel synthesis. Extensive experiments demonstrate that TempoMaster establishes a new state-of-the-art in long video generation, excelling in both visual and temporal quality.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Scottttttyy/TempoMaster
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Coding and Compression Technologies · Advanced Vision and Imaging