TL;DR
Warp-as-History enables camera-controlled video generation from a single training video without additional training or optimization, by converting camera warps into pseudo-history for improved trajectory following.
Contribution
It introduces a novel interface that leverages camera warps as pseudo-history, enabling zero-shot camera trajectory control in frozen models and enhancing performance with minimal finetuning.
Findings
Zero-shot camera trajectory following achieved without training or optimization.
Lightweight LoRA finetuning improves adherence and visual quality.
Method generalizes well across diverse datasets.
Abstract
Camera-controlled video generation has made substantial progress, enabling generated videos to follow prescribed viewpoint trajectories. However, existing methods usually learn camera-specific conditioning through camera encoders, control branches, or attention and positional-encoding modifications, which often require post-training on large-scale camera-annotated videos. Training-free alternatives avoid such post-training, but often shift the cost to test-time optimization or extra denoising-time guidance. We propose Warp-as-History, a simple interface that turns camera-induced warps into camera-warped pseudo-history with target-frame positional alignment and visible-token selection. Given a target camera trajectory, we construct camera-warped pseudo-history from past observations and feed it through the model's visual-history pathway. Crucially, we align its positional encoding with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
