StreamGVE: Training-Free Video Editing via Few-Step Streaming Video Generation
Guanlong Jiao, Chenyangguang Zhang, Jia Jun Cheng Xian, Zewei Zhang, Renjie Liao

TL;DR
StreamGVE introduces a novel noise-to-data video editing approach using pre-trained streaming models, enabling high-quality editing with minimal steps and reduced computational cost.
Contribution
It proposes a new noise-to-data paradigm for video editing, with dual-branch fast sampling, source-oriented guidance, and visual prompting, improving efficiency and flexibility.
Findings
Outperforms existing methods across various video editing tasks.
Achieves high-quality results with few-step sampling and minimal time.
Demonstrates robustness and generalizability across models.
Abstract
Although existing video editing methods are generally feasible, they often require many costly iterations and still struggle to deliver high-quality yet satisfying editing results. We attribute this limitation to the prevalent data-to-data paradigm, which is less compatible with modern generative models than noise-to-data generation. To address this gap, we revisit video editing from a noise-to-data perspective and propose Streaming-Generation-based Video Editing (StreamGVE), which preserves few-step sampling while seamlessly injecting source-video conditions. Built on pre-trained streaming generation models, StreamGVE introduces dual-branch fast sampling with a self-attention bridge and cross-attention grounding/boosting to satisfy both sampling and conditioning requirements. We further propose source-oriented guidance to improve target-generation quality, and a visual prompting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
