Approximate Caching for Efficiently Serving Diffusion Models
Shubham Agarwal, Subrata Mitra, Sarthak Chakraborty, Srikrishna, Karanam, Koyel Mukherjee, Shiv Saini

TL;DR
This paper introduces approximate-caching to reduce resource use and latency in diffusion model-based text-to-image generation by reusing intermediate states, demonstrated through the Nirvana system with significant savings.
Contribution
The paper proposes a novel approximate-caching technique with a new cache management policy, improving efficiency and reducing costs in production diffusion model serving.
Findings
19.8% reduction in end-to-end latency
19% cost savings on average
Effective reuse of intermediate states in real workloads
Abstract
Text-to-image generation using diffusion models has seen explosive popularity owing to their ability in producing high quality images adhering to text prompts. However, production-grade diffusion model serving is a resource intensive task that not only require high-end GPUs which are expensive but also incurs considerable latency. In this paper, we introduce a technique called approximate-caching that can reduce such iterative denoising steps for an image generation based on a prompt by reusing intermediate noise states created during a prior image generation for similar prompts. Based on this idea, we present an end to end text-to-image system, Nirvana, that uses the approximate-caching with a novel cache management-policy Least Computationally Beneficial and Frequently Used (LCBFU) to provide % GPU compute savings, 19.8% end-to-end latency reduction and 19% dollar savings, on average,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Recommender Systems and Techniques · Caching and Content Delivery
MethodsDiffusion
