MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices
Shuai Zhang, Bao Tang, Siyuan Yu, Yueting Zhu, Jingfeng Yao, Ya Zou, Shanglin Yuan, Li Yu, Wenyu Liu, Xinggang Wang

TL;DR
MobileI2V introduces a lightweight diffusion model optimized for real-time, high-resolution image-to-video generation on mobile devices, achieving significant speed improvements while maintaining quality.
Contribution
The paper presents a novel lightweight diffusion model with a hybrid attention architecture and a time-step distillation strategy tailored for mobile devices, enabling fast 720p video generation.
Findings
Real-time 720p video generation on mobile devices.
10-fold speed-up with minimal quality loss.
Achieved frame generation under 100 ms.
Abstract
Recently, video generation has witnessed rapid advancements, drawing increasing attention to image-to-video (I2V) synthesis on mobile devices. However, the substantial computational complexity and slow generation speed of diffusion models pose significant challenges for real-time, high-resolution video generation on resource-constrained mobile devices. In this work, we propose MobileI2V, a 270M lightweight diffusion model for real-time image-to-video generation on mobile devices. The core lies in: (1) We analyzed the performance of linear attention modules and softmax attention modules on mobile devices, and proposed a linear hybrid architecture denoiser that balances generation efficiency and quality. (2) We design a time-step distillation strategy that compresses the I2V sampling steps from more than 20 to only two without significant quality loss, resulting in a 10-fold increase in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Visual Attention and Saliency Detection
