StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation
Akio Kodaira, Chenfeng Xu, Toshiki Hazama, Takanori Yoshimoto, Kohei Ohno, Shogo Mitsuhori, Soichi Sugano, Hanying Cho, Zhijian Liu, Masayoshi Tomizuka, Kurt Keutzer

TL;DR
StreamDiffusion introduces a real-time diffusion pipeline with batching, novel guidance, and filtering techniques, enabling high-throughput, energy-efficient interactive image generation suitable for live scenarios like Metaverse and streaming.
Contribution
The paper presents a novel batching approach, residual classifier-free guidance, and a stochastic similarity filter to significantly improve real-time diffusion image generation.
Findings
Achieves 1.5x speedup over sequential denoising.
Up to 2.05x faster with residual guidance.
Reaches 91.07 fps on RTX4090 with reduced energy consumption.
Abstract
We introduce StreamDiffusion, a real-time diffusion pipeline designed for interactive image generation. Existing diffusion models are adept at creating images from text or image prompts, yet they often fall short in real-time interaction. This limitation becomes particularly evident in scenarios involving continuous input, such as Metaverse, live video streaming, and broadcasting, where high throughput is imperative. To address this, we present a novel approach that transforms the original sequential denoising into the batching denoising process. Stream Batch eliminates the conventional wait-and-interact approach and enables fluid and high throughput streams. To handle the frequency disparity between data input and model throughput, we design a novel input-output queue for parallelizing the streaming process. Moreover, the existing diffusion pipeline uses classifier-free guidance(CFG),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Video Coding and Compression Technologies
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution · Concatenated Skip Connection · U-Net · Diffusion
