SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds
Yanyu Li, Huan Wang, Qing Jin, Ju Hu, Pavlo Chemerys, Yun Fu, Yanzhi, Wang, Sergey Tulyakov, Jian Ren

TL;DR
SnapFusion introduces a highly efficient text-to-image diffusion model capable of generating high-quality images on mobile devices in under two seconds, making advanced AI art accessible and private.
Contribution
The paper presents a novel, optimized UNet architecture and step distillation techniques that enable fast, high-quality image generation on mobile devices, a significant improvement over existing models.
Findings
Achieves under 2 seconds inference on mobile devices
Outperforms Stable Diffusion v1.5 with fewer steps
Maintains high image quality with improved FID and CLIP scores
Abstract
Text-to-image diffusion models can create stunning images from natural language descriptions that rival the work of professional artists and photographers. However, these models are large, with complex network architectures and tens of denoising iterations, making them computationally expensive and slow to run. As a result, high-end GPUs and cloud-based inference are required to run diffusion models at scale. This is costly and has privacy implications, especially when user data is sent to a third party. To overcome these challenges, we present a generic approach that, for the first time, unlocks running text-to-image diffusion models on mobile devices in less than seconds. We achieve so by introducing efficient network architecture and improving step distillation. Specifically, we propose an efficient UNet by identifying the redundancy of the original model and reducing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computational and Text Analysis Methods
MethodsDiffusion · Contrastive Language-Image Pre-training
