Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations
Yu-Hui Chen, Raman Sarokin, Juhyun Lee, Jiuqiang Tang, Chuo-Ling, Chang, Andrei Kulik, Matthias Grundmann

TL;DR
This paper introduces GPU-aware optimizations that significantly accelerate large diffusion models for on-device deployment, enabling faster inference times on mobile devices and expanding practical AI applications.
Contribution
The authors develop implementation optimizations that achieve the fastest inference latency for large diffusion models on mobile GPUs, without using quantization.
Findings
Inference latency under 12 seconds on Samsung S23 Ultra
Optimizations enable real-time on-device diffusion model usage
Broader applicability of generative AI on resource-constrained devices
Abstract
The rapid development and application of foundation models have revolutionized the field of artificial intelligence. Large diffusion models have gained significant attention for their ability to generate photorealistic images and support various tasks. On-device deployment of these models provides benefits such as lower server costs, offline functionality, and improved user privacy. However, common large diffusion models have over 1 billion parameters and pose challenges due to restricted computational and memory resources on devices. We present a series of implementation optimizations for large diffusion models that achieve the fastest reported inference latency to-date (under 12 seconds for Stable Diffusion 1.4 without int8 quantization on Samsung S23 Ultra for a 512x512 image with 20 iterations) on GPU-equipped mobile devices. These enhancements broaden the applicability of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Single-cell and spatial transcriptomics
MethodsDiffusion
