VIA: Unified Spatiotemporal Video Adaptation Framework for Global and Local Video Editing
Jing Gu, Yuwei Fang, Ivan Skorokhodov, Peter Wonka, Xinya Du, Sergey, Tulyakov, Xin Eric Wang

TL;DR
VIA is a unified framework that enables consistent and precise global and local editing of long videos by adapting pre-trained models for spatiotemporal coherence and control.
Contribution
The paper introduces VIA, a novel spatiotemporal video adaptation framework that improves long video editing consistency and local control through test-time and recursive attention strategies.
Findings
Produces more faithful and coherent video edits
Achieves consistent long video editing in minutes
Outperforms baseline methods in accuracy and control
Abstract
Video editing serves as a fundamental pillar of digital media, spanning applications in entertainment, education, and professional communication. However, previous methods often overlook the necessity of comprehensively understanding both global and local contexts, leading to inaccurate and inconsistent edits in the spatiotemporal dimension, especially for long videos. In this paper, we introduce VIA, a unified spatiotemporal Video Adaptation framework for global and local video editing, pushing the limits of consistently editing minute-long videos. First, to ensure local consistency within individual frames, we designed test-time editing adaptation to adapt a pre-trained image editing model for improving consistency between potential editing directions and the text instruction, and adapts masked latent variables for precise local control. Furthermore, to maintain global consistency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Vision and Imaging
MethodsSoftmax · Attention Is All You Need
