MaskFlow: Discrete Flows For Flexible and Efficient Long Video Generation
Michael Fuest, Vincent Tao Hu, Bj\"orn Ommer

TL;DR
MaskFlow introduces a novel framework combining discrete representations and flow-matching for efficient, high-quality long video generation, capable of producing videos ten times longer than training sequences with flexible sampling modes.
Contribution
The paper presents MaskFlow, a unified approach that enables efficient long video generation using frame-level masking and flow-matching, extending sequence length significantly beyond training data.
Findings
Achieves competitive FVD scores on FFS and DMLab datasets.
Supports both autoregressive and full-sequence generation modes.
Demonstrates training-free application to timestep-dependent and independent models.
Abstract
Generating long, high-quality videos remains a challenge due to the complex interplay of spatial and temporal dynamics and hardware limitations. In this work, we introduce MaskFlow, a unified video generation framework that combines discrete representations with flow-matching to enable efficient generation of high-quality long videos. By leveraging a frame-level masking strategy during training, MaskFlow conditions on previously generated unmasked frames to generate videos with lengths ten times beyond that of the training sequences. MaskFlow does so very efficiently by enabling the use of fast Masked Generative Model (MGM)-style sampling and can be deployed in both fully autoregressive as well as full-sequence generation modes. We validate the quality of our method on the FaceForensics (FFS) and Deepmind Lab (DMLab) datasets and report Frechet Video Distance (FVD) competitive with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Coding and Compression Technologies · Advanced Vision and Imaging · Image and Video Quality Assessment
