Streaming Video Diffusion: Online Video Editing with Diffusion Models

Feng Chen; Zhen Yang; Bohan Zhuang; Qi Wu

arXiv:2405.19726·cs.CV·May 31, 2024

Streaming Video Diffusion: Online Video Editing with Diffusion Models

Feng Chen, Zhen Yang, Bohan Zhuang, Qi Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces Streaming Video Diffusion, a real-time online video editing method that maintains temporal consistency in streaming videos using a diffusion model with temporal recurrence, suitable for live applications.

Contribution

The paper proposes SVDiff, a novel diffusion-based framework for online video editing that handles streaming frames with temporal coherence and zero-shot capabilities.

Findings

01

Achieves 15.2 FPS inference speed at 512x512 resolution.

02

Effectively edits long, high-quality videos with temporal consistency.

03

Supports a broad range of videos with a single model.

Abstract

We present a novel task called online video editing, which is designed to edit \textbf{streaming} frames while maintaining temporal consistency. Unlike existing offline video editing assuming all frames are pre-established and accessible, online video editing is tailored to real-life applications such as live streaming and online chat, requiring (1) fast continual step inference, (2) long-term temporal modeling, and (3) zero-shot video editing capability. To solve these issues, we propose Streaming Video Diffusion (SVDiff), which incorporates the compact spatial-aware temporal recurrence into off-the-shelf Stable Diffusion and is trained with the segment-level scheme on large-scale long videos. This simple yet effective setup allows us to obtain a single model that is capable of executing a broad range of videos and editing each streaming frame with temporal coherence. Our experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Chenfeng1271/SVDiff
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimedia Communication and Technology · Digital Rights Management and Security

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion