CamDirector: Towards Long-Term Coherent Video Trajectory Editing
Zhihao Shi, Kejia Yin, Weilin Wan, Yuhongze Zhou, Yuanhao Yu, Xinxin Zuo, Qiang Sun, Juwei Lu

TL;DR
CamDirector introduces a novel video trajectory editing framework that enhances long-term coherence and camera control by explicit information aggregation and history-guided diffusion, enabling professional-quality video synthesis from amateur footage.
Contribution
The paper proposes a new VTE framework with hybrid warping and autoregressive diffusion, improving long-term consistency and camera control over existing methods.
Findings
Achieves state-of-the-art performance on the iPhone-PTZ benchmark.
Effectively maintains long-term temporal coherence in edited videos.
Reduces model complexity while enhancing editing quality.
Abstract
Video (camera) trajectory editing aims to synthesize new videos that follow user-defined camera paths while preserving scene content and plausibly inpainting previously unseen regions, upgrading amateur footage into professionally styled videos. Existing VTE methods struggle with precise camera control and long-range consistency because they either inject target poses through a limited-capacity embedding or rely on single-frame warping with only implicit cross-frame aggregation in video diffusion models. To address these issues, we introduce a new VTE framework that 1) explicitly aggregates information across the entire source video via a hybrid warping scheme. Specifically, static regions are progressively fused into a world cache then rendered to target camera poses, while dynamic regions are directly warped; their fusion yields globally consistent coarse frames that guide refinement.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization
