StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

Akio Kodaira; Chenfeng Xu; Toshiki Hazama; Takanori Yoshimoto; Kohei Ohno; Shogo Mitsuhori; Soichi Sugano; Hanying Cho; Zhijian Liu; Masayoshi Tomizuka; Kurt Keutzer

arXiv:2312.12491·cs.CV·July 9, 2025·6 cites

StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

Akio Kodaira, Chenfeng Xu, Toshiki Hazama, Takanori Yoshimoto, Kohei Ohno, Shogo Mitsuhori, Soichi Sugano, Hanying Cho, Zhijian Liu, Masayoshi Tomizuka, Kurt Keutzer

PDF

Open Access 1 Repo

TL;DR

StreamDiffusion introduces a real-time diffusion pipeline with batching, novel guidance, and filtering techniques, enabling high-throughput, energy-efficient interactive image generation suitable for live scenarios like Metaverse and streaming.

Contribution

The paper presents a novel batching approach, residual classifier-free guidance, and a stochastic similarity filter to significantly improve real-time diffusion image generation.

Findings

01

Achieves 1.5x speedup over sequential denoising.

02

Up to 2.05x faster with residual guidance.

03

Reaches 91.07 fps on RTX4090 with reduced energy consumption.

Abstract

We introduce StreamDiffusion, a real-time diffusion pipeline designed for interactive image generation. Existing diffusion models are adept at creating images from text or image prompts, yet they often fall short in real-time interaction. This limitation becomes particularly evident in scenarios involving continuous input, such as Metaverse, live video streaming, and broadcasting, where high throughput is imperative. To address this, we present a novel approach that transforms the original sequential denoising into the batching denoising process. Stream Batch eliminates the conventional wait-and-interact approach and enables fluid and high throughput streams. To handle the frequency disparity between data input and model throughput, we design a novel input-output queue for parallelizing the streaming process. Moreover, the existing diffusion pipeline uses classifier-free guidance(CFG),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cumulo-autumn/streamdiffusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Video Coding and Compression Technologies

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution · Concatenated Skip Connection · U-Net · Diffusion