SF-V: Single Forward Video Generation Model

Zhixing Zhang; Yanyu Li; Yushu Wu; Yanwu Xu; Anil Kag; Ivan; Skorokhodov; Willi Menapace; Aliaksandr Siarohin; Junli Cao; Dimitris; Metaxas; Sergey Tulyakov; Jian Ren

arXiv:2406.04324·cs.CV·October 28, 2024·1 cites

SF-V: Single Forward Video Generation Model

Zhixing Zhang, Yanyu Li, Yushu Wu, Yanwu Xu, Anil Kag, Ivan, Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Dimitris, Metaxas, Sergey Tulyakov, Jian Ren

PDF

Open Access 1 Repo

TL;DR

This paper introduces SF-V, a single forward pass video generation model that uses adversarial training to convert diffusion-based models into real-time video synthesizers with high quality and reduced computational costs.

Contribution

The paper presents a novel adversarial training approach to fine-tune pre-trained diffusion models for single-step, high-quality video generation, enabling real-time applications.

Findings

01

Achieves around 23x speedup over SVD and 6x over existing methods.

02

Maintains competitive video quality with significantly reduced computation.

03

Enables real-time video synthesis and editing.

Abstract

Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune pre-trained video diffusion models. We show that, through the adversarial training, the multi-steps video diffusion model, i.e., Stable Video Diffusion (SVD), can be trained to perform single forward pass to synthesize high-quality videos, capturing both temporal and spatial dependencies in the video data. Extensive experiments demonstrate that our method achieves competitive generation quality of synthesized videos with significantly reduced computational overhead for the denoising process…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

snap-research/SF-V
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment

MethodsDiffusion