SwiftVideo: A Unified Framework for Few-Step Video Generation through Trajectory-Distribution Alignment
Yanxiao Sun, Jiafu Wu, Yun Cao, Chengming Xu, Yabiao Wang, Weijian Cao, Donghao Luo, Chengjie Wang, Yanwei Fu

TL;DR
SwiftVideo is a novel distillation framework that combines trajectory-preserving and distribution-matching strategies to enable high-quality, few-step video generation with reduced computational cost.
Contribution
It introduces continuous-time consistency distillation and dual-perspective alignment to improve stability and performance in few-step video synthesis.
Findings
Outperforms existing methods on OpenVid-1M benchmark
Maintains high video quality with fewer inference steps
Reduces computational overhead significantly
Abstract
Diffusion-based or flow-based models have achieved significant progress in video synthesis but require multiple iterative sampling steps, which incurs substantial computational overhead. While many distillation methods that are solely based on trajectory-preserving or distribution-matching have been developed to accelerate video generation models, these approaches often suffer from performance breakdown or increased artifacts under few-step settings. To address these limitations, we propose \textbf{\emph{SwiftVideo}}, a unified and stable distillation framework that combines the advantages of trajectory-preserving and distribution-matching strategies. Our approach introduces continuous-time consistency distillation to ensure precise preservation of ODE trajectories. Subsequently, we propose a dual-perspective alignment that includes distribution alignment between synthetic and real data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsVideo Analysis and Summarization · Video Coding and Compression Technologies · Advanced Vision and Imaging
