Mobile Video Diffusion

Haitam Ben Yahia; Denis Korzhenkov; Ioannis Lelekas; Amir; Ghodrati; Amirhossein Habibian

arXiv:2412.07583·cs.CV·December 11, 2024

Mobile Video Diffusion

Haitam Ben Yahia, Denis Korzhenkov, Ioannis Lelekas, Amir, Ghodrati, Amirhossein Habibian

PDF

Open Access

TL;DR

This paper presents MobileVD, a mobile-optimized video diffusion model that significantly reduces computational costs while maintaining acceptable quality, enabling real-time video generation on mobile devices.

Contribution

Introduces MobileVD, the first efficient mobile-optimized video diffusion model with novel pruning and adversarial finetuning techniques for real-time performance.

Findings

01

MobileVD is 523x more efficient than previous models.

02

Generates 14x512x256 px clips in 1.7 seconds on a mobile device.

03

Achieves a slight quality drop with FVD of 149.

Abstract

Video diffusion models have achieved impressive realism and controllability but are limited by high computational demands, restricting their use on mobile devices. This paper introduces the first mobile-optimized video diffusion model. Starting from a spatio-temporal UNet from Stable Video Diffusion (SVD), we reduce memory and computational cost by reducing the frame resolution, incorporating multi-scale temporal representations, and introducing two novel pruning schema to reduce the number of channels and temporal blocks. Furthermore, we employ adversarial finetuning to reduce the denoising to a single step. Our model, coined as MobileVD, is 523x more efficient (1817.2 vs. 4.34 TFLOPs) with a slight quality drop (FVD 149 vs. 171), generating latents for a 14x512x256 px clip in 1.7 seconds on a Xiaomi-14 Pro. Our results are available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques

MethodsPruning · Diffusion · Contrastive Language-Image Pre-training