FlashVideo: Flowing Fidelity to Detail for Efficient High-Resolution Video Generation
Shilong Zhang, Wenbo Li, Shoufa Chen, Chongjian GE, Peize Sun, Yifu Zhang, Yi Jiang, Zehuan Yuan, Bingyue Peng, Ping Luo

TL;DR
FlashVideo introduces a two-stage framework for high-resolution video generation that balances fidelity and efficiency, enabling detailed outputs with reduced computational costs and faster preview capabilities.
Contribution
A novel two-stage approach that strategically allocates resources to improve high-resolution video quality and computational efficiency in text-to-video generation.
Findings
Achieves state-of-the-art high-resolution video quality
Reduces computational costs significantly
Enables prompt preview and adjustment before full-resolution generation
Abstract
DiT models have achieved great success in text-to-video generation, leveraging their scalability in model capacity and data scale. High content and motion fidelity aligned with text prompts, however, often require large model parameters and a substantial number of function evaluations (NFEs). Realistic and visually appealing details are typically reflected in high-resolution outputs, further amplifying computational demands-especially for single-stage DiT models. To address these challenges, we propose a novel two-stage framework, FlashVideo, which strategically allocates model capacity and NFEs across stages to balance generation fidelity and quality. In the first stage, prompt fidelity is prioritized through a low-resolution generation process utilizing large parameters and sufficient NFEs to enhance computational efficiency. The second stage achieves a nearly straight ODE trajectory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Vision and Imaging · Video Coding and Compression Technologies
MethodsDiffusion
