SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
Yushu Wu, Zhixing Zhang, Yanyu Li, Yanwu Xu, Anil Kag, Yang Sui, Huseyin Coskun, Ke Ma, Aleksei Lebedev, Ju Hu, Dimitris Metaxas, Yanzhi Wang, Sergey Tulyakov, Jian Ren

TL;DR
This paper introduces SnapGen-V, a highly efficient diffusion-based video generation model that can produce a 5-second video on a mobile device in just 5 seconds, making high-quality video synthesis accessible on edge devices.
Contribution
We develop a compact, efficient diffusion model with a specialized architecture and adversarial fine-tuning, enabling fast, high-quality video generation on mobile hardware.
Findings
Generates 5-second videos on iPhone 16 PM within 5 seconds.
Reduces denoising steps to 4 for efficiency.
Achieves comparable quality to server-side models with much faster speed.
Abstract
We have witnessed the unprecedented success of diffusion-based video generation over the past year. Recently proposed models from the community have wielded the power to generate cinematic and high-resolution videos with smooth motions from arbitrary input prompts. However, as a supertask of image generation, video generation models require more computation and are thus hosted mostly on cloud servers, limiting broader adoption among content creators. In this work, we propose a comprehensive acceleration framework to bring the power of the large-scale video diffusion model to the hands of edge users. From the network architecture scope, we initialize from a compact image backbone and search out the design and arrangement of temporal layers to maximize hardware efficiency. In addition, we propose a dedicated adversarial fine-tuning algorithm for our efficient model and reduce the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimedia Communication and Technology · Interactive and Immersive Displays
MethodsDiffusion
