MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices

Shuai Zhang; Bao Tang; Siyuan Yu; Yueting Zhu; Jingfeng Yao; Ya Zou; Shanglin Yuan; Li Yu; Wenyu Liu; Xinggang Wang

arXiv:2511.21475·cs.CV·November 27, 2025

MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices

Shuai Zhang, Bao Tang, Siyuan Yu, Yueting Zhu, Jingfeng Yao, Ya Zou, Shanglin Yuan, Li Yu, Wenyu Liu, Xinggang Wang

PDF

Open Access

TL;DR

MobileI2V introduces a lightweight diffusion model optimized for real-time, high-resolution image-to-video generation on mobile devices, achieving significant speed improvements while maintaining quality.

Contribution

The paper presents a novel lightweight diffusion model with a hybrid attention architecture and a time-step distillation strategy tailored for mobile devices, enabling fast 720p video generation.

Findings

01

Real-time 720p video generation on mobile devices.

02

10-fold speed-up with minimal quality loss.

03

Achieved frame generation under 100 ms.

Abstract

Recently, video generation has witnessed rapid advancements, drawing increasing attention to image-to-video (I2V) synthesis on mobile devices. However, the substantial computational complexity and slow generation speed of diffusion models pose significant challenges for real-time, high-resolution video generation on resource-constrained mobile devices. In this work, we propose MobileI2V, a 270M lightweight diffusion model for real-time image-to-video generation on mobile devices. The core lies in: (1) We analyzed the performance of linear attention modules and softmax attention modules on mobile devices, and proposed a linear hybrid architecture denoiser that balances generation efficiency and quality. (2) We design a time-step distillation strategy that compresses the I2V sampling steps from more than 20 to only two without significant quality loss, resulting in a 10-fold increase in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Visual Attention and Saliency Detection