FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

Shilong Zhang; Wenbo Li; Shoufa Chen; Chongjian GE; Peize Sun; Yifu Zhang; Yi Jiang; Zehuan Yuan; Bingyue Peng; Ping Luo

arXiv:2502.05179·cs.CV·February 2, 2026

FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

Shilong Zhang, Wenbo Li, Shoufa Chen, Chongjian GE, Peize Sun, Yifu Zhang, Yi Jiang, Zehuan Yuan, Bingyue Peng, Ping Luo

PDF

Open Access 1 Repo 1 Models

TL;DR

FlashVideo introduces a two-stage framework for high-resolution video generation that balances fidelity and efficiency, enabling detailed outputs with reduced computational costs and faster preview capabilities.

Contribution

A novel two-stage approach that strategically allocates resources to improve high-resolution video quality and computational efficiency in text-to-video generation.

Findings

01

Achieves state-of-the-art high-resolution video quality

02

Reduces computational costs significantly

03

Enables prompt preview and adjustment before full-resolution generation

Abstract

DiT models have achieved great success in text-to-video generation, leveraging their scalability in model capacity and data scale. High content and motion fidelity aligned with text prompts, however, often require large model parameters and a substantial number of function evaluations (NFEs). Realistic and visually appealing details are typically reflected in high-resolution outputs, further amplifying computational demands-especially for single-stage DiT models. To address these challenges, we propose a novel two-stage framework, FlashVideo, which strategically allocates model capacity and NFEs across stages to balance generation fidelity and quality. In the first stage, prompt fidelity is prioritized through a low-resolution generation process utilizing large parameters and sufficient NFEs to enhance computational efficiency. The second stage achieves a nearly straight ODE trajectory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

foundationvision/flashvideo
pytorchOfficial

Models

🤗
FoundationVision/FlashVideo
model· ♡ 13
♡ 13

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Advanced Vision and Imaging · Video Coding and Compression Technologies

MethodsDiffusion