SURF: Signature-Retained Fast Video Generation
Kaixin Ding, Xi Chen, Sihui Ji, Yuan Gao, Liang Hou, Xin Tao, Hengshuang Zhao

TL;DR
SURF is an efficient two-stage framework that accelerates high-resolution video generation while preserving the original model's signatures by combining low-res previews and a specialized Refiner.
Contribution
It introduces a novel, training-free noise reshifting technique and a mapping-based Refiner to significantly speed up high-res video generation without signature loss.
Findings
Achieves 12.5x speedup on Wan 2.1 videos
Achieves 8.7x speedup on HunyuanVideo
Maintains signatures close to pretrained models
Abstract
The demand for high-resolution video generation is growing rapidly. However, the generation resolution is severely constrained by slow inference speeds. For instance, Wan2.1 requires over 50 minutes to generate a single 720p video. While previous works explore accelerating video generation from various aspects, most of them compromise the distinctive signatures (e.g., layout, semantic, motion) of the original model. In this work, we propose SURF, an efficient framework for generating high-resolution videos, while maximally keeping the signatures. Specifically, SURF divides video generation into two stages: First, we leverage the pretrained model to infer at optimal resolution and downsample latent to generate low-resolution previews in fast speed; then we design a Refiner to upscale the preview. In the preview stage, we identify that directly inferring a model (trained with higher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging
