StreamingEffect: Real-Time Human-Centric Video Effect Generation
Yiren Song, Cheng Liu, Yuxin Jiang, Mike Zheng Shou

TL;DR
StreamingEffect introduces a real-time, human-centric video effect generation framework that combines in-context editing, a large curated dataset, and efficient distillation to enable high-quality live video editing.
Contribution
The paper presents a novel real-time streaming video effect framework with a new in-context editing architecture, a large dataset, and effective model distillation techniques.
Findings
Enables real-time 720p video editing on a single GPU.
Introduces keyframe control for interactive online effect injection.
Constructs VideoEffect-130K, the largest human-centric video effect dataset.
Abstract
Streaming video effect generation is highly desirable for live human-centric applications such as e-commerce streaming, entertainment, and vlogging, yet remains difficult due to the lack of suitable data and deployable editing models. Unlike generic video generation, this task requires real-time video-to-video editing that adds expressive effects while preserving human identity, background content, and temporal consistency. Existing acceleration efforts mainly focus on text-to-video generation, while efficient distillation for video editing remains largely underexplored. In this paper, we present \textbf{StreamingEffect}, a real-time human-centric streaming video effect framework. We adopt an in-context video editing architecture and train a high-quality bidirectional teacher, then distill it into a causal autoregressive student and further reduce sampling from 50 steps to 4 steps. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
