CamDirector: Towards Long-Term Coherent Video Trajectory Editing

Zhihao Shi; Kejia Yin; Weilin Wan; Yuhongze Zhou; Yuanhao Yu; Xinxin Zuo; Qiang Sun; Juwei Lu

arXiv:2603.02256·cs.CV·March 4, 2026

CamDirector: Towards Long-Term Coherent Video Trajectory Editing

Zhihao Shi, Kejia Yin, Weilin Wan, Yuhongze Zhou, Yuanhao Yu, Xinxin Zuo, Qiang Sun, Juwei Lu

PDF

Open Access

TL;DR

CamDirector introduces a novel video trajectory editing framework that enhances long-term coherence and camera control by explicit information aggregation and history-guided diffusion, enabling professional-quality video synthesis from amateur footage.

Contribution

The paper proposes a new VTE framework with hybrid warping and autoregressive diffusion, improving long-term consistency and camera control over existing methods.

Findings

01

Achieves state-of-the-art performance on the iPhone-PTZ benchmark.

02

Effectively maintains long-term temporal coherence in edited videos.

03

Reduces model complexity while enhancing editing quality.

Abstract

Video (camera) trajectory editing aims to synthesize new videos that follow user-defined camera paths while preserving scene content and plausibly inpainting previously unseen regions, upgrading amateur footage into professionally styled videos. Existing VTE methods struggle with precise camera control and long-range consistency because they either inject target poses through a limited-capacity embedding or rely on single-frame warping with only implicit cross-frame aggregation in video diffusion models. To address these issues, we introduce a new VTE framework that 1) explicitly aggregates information across the entire source video via a hybrid warping scheme. Specifically, static regions are progressively fused into a world cache then rendered to target camera poses, while dynamic regions are directly warped; their fusion yields globally consistent coarse frames that guide refinement.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization