Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing
Tongtong Su, Chengyu Wang, Jun Huang, Dongming Lu

TL;DR
This paper introduces Zero-to-Hero, a reference-based video editing method that achieves accurate, temporally consistent edits starting from zero-shot initialization, outperforming existing approaches in quality and robustness.
Contribution
It proposes a novel zero-shot initialization approach for reference-based video editing that enhances accuracy and temporal consistency, with a robust correspondence-guided attention mechanism.
Findings
Outperforms baseline with 2.6 dB PSNR improvement
Uses correspondence-guided attention for robustness against large motions
Provides a deterministic evaluation framework with Blender-generated videos
Abstract
Appearance editing according to user needs is a pivotal task in video editing. Existing text-guided methods often lead to ambiguities regarding user intentions and restrict fine-grained control over editing specific aspects of objects. To overcome these limitations, this paper introduces a novel approach named {Zero-to-Hero}, which focuses on reference-based video editing that disentangles the editing process into two distinct problems. It achieves this by first editing an anchor frame to satisfy user requirements as a reference image and then consistently propagating its appearance across other frames. We leverage correspondence within the original frames to guide the attention mechanism, which is more robust than previously proposed optical flow or temporal modules in memory-friendly video generative models, especially when dealing with objects exhibiting large motions. It offers a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Image Enhancement Techniques
MethodsAttention Is All You Need · Softmax · RoIAlign · RoIPool · Sparse Evolutionary Training
