STARFlow-V: End-to-End Video Generative Modeling with Normalizing Flows
Jiatao Gu, Ying Shen, Tianrong Chen, Laurent Dinh, Yuyang Wang, Miguel Angel Bautista, David Berthelot, Josh Susskind, Shuangfei Zhai

TL;DR
STARFlow-V introduces a normalizing flow-based approach for end-to-end video generation, demonstrating high-quality, temporally consistent videos with efficient sampling, challenging the dominance of diffusion models in this domain.
Contribution
The paper presents STARFlow-V, a novel normalizing flow-based video generator with a global-local architecture, flow-score matching, and parallelizable sampling, enabling high-quality autoregressive video synthesis.
Findings
Achieves strong visual fidelity and temporal consistency.
Supports multiple generation tasks including text-to-video.
Provides practical sampling throughput comparable to diffusion models.
Abstract
Normalizing flows (NFs) are end-to-end likelihood-based generative models for continuous data, and have recently regained attention with encouraging progress on image generation. Yet in the video generation domain, where spatiotemporal complexity and computational cost are substantially higher, state-of-the-art systems almost exclusively rely on diffusion-based models. In this work, we revisit this design space by presenting STARFlow-V, a normalizing flow-based video generator with substantial benefits such as end-to-end learning, robust causal prediction, and native likelihood estimation. Building upon the recently proposed STARFlow, STARFlow-V operates in the spatiotemporal latent space with a global-local architecture which restricts causal dependencies to a global latent space while preserving rich local within-frame interactions. This eases error accumulation over time, a common…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Model Reduction and Neural Networks
