AnchorSync: Global Consistency Optimization for Long Video Editing
Zichi Liu, Yinggui Wang, Tao Wei, Chao Ma

TL;DR
AnchorSync is a diffusion-based framework that improves long video editing by ensuring global consistency and temporal coherence through sparse anchor editing and interpolation, outperforming previous methods.
Contribution
It introduces a novel diffusion-based approach that decouples long video editing into anchor frame editing and interpolation, enhancing quality and stability.
Findings
Produces more coherent long videos with fewer artifacts
Outperforms prior methods in visual quality and temporal stability
Effective in maintaining global consistency across long sequences
Abstract
Editing long videos remains a challenging task due to the need for maintaining both global consistency and temporal coherence across thousands of frames. Existing methods often suffer from structural drift or temporal artifacts, particularly in minute-long sequences. We introduce AnchorSync, a novel diffusion-based framework that enables high-quality, long-term video editing by decoupling the task into sparse anchor frame editing and smooth intermediate frame interpolation. Our approach enforces structural consistency through a progressive denoising process and preserves temporal dynamics via multimodal guidance. Extensive experiments show that AnchorSync produces coherent, high-fidelity edits, surpassing prior methods in visual quality and temporal stability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Video Coding and Compression Technologies · Multimedia Communication and Technology
