Probing into Camera Control of Video Models
Chen Hou, Christian Rupprecht

TL;DR
This paper introduces a geometric approach to camera control in video models, reformulating it as displacement fields, enabling effective control without retraining and revealing biases in existing models.
Contribution
It proposes a novel geometric guidance method for camera control in video diffusion models that requires no additional training and serves as a probe for model capabilities.
Findings
Effective camera control achieved with minimal quality degradation.
Identified universal biases shared by video models.
Benchmarking reveals disparities in multi-view generation performance.
Abstract
Video is a rich and scalable source of 3D/4D visual observations, and camera control is a key capability for video generation models to produce geometrically meaningful content. Existing approaches typically learn a mapping from camera motion to video using additional camera modules and paired data. However, such datasets are often limited in scale, diversity, and scene dynamics, which can bias the model toward a narrow output distribution and compromise the strong prior learned by the base model. These limitations motivate a different perspective on camera control. In this paper, we show that camera control need not be modeled as an implicit mapping problem, but can instead be treated as a form of geometric guidance that induces displacements across frames. Specifically, we reformulate camera control into a set of displacement fields and apply them via differentiable resampling of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
