CamCtrl3D: Single-Image Scene Exploration with Precise 3D Camera Control
Stefan Popov, Amit Raj, Michael Krainin, Yuanzhen Li, William T., Freeman, Michael Rubinstein

TL;DR
CamCtrl3D introduces a novel method for generating realistic fly-through videos from a single image by integrating multiple conditioning techniques and a global 3D representation, enabling precise camera control and scene exploration.
Contribution
The paper presents a new approach that combines various conditioning methods and 3D representations to improve single-image scene exploration and video generation.
Findings
Achieves state-of-the-art results in scene exploration from a single image.
Identifies optimal combinations of conditioning techniques for best video quality.
Provides a new metric to evaluate video quality and view consistency.
Abstract
We propose a method for generating fly-through videos of a scene, from a single image and a given camera trajectory. We build upon an image-to-video latent diffusion model. We condition its UNet denoiser on the camera trajectory, using four techniques. (1) We condition the UNet's temporal blocks on raw camera extrinsics, similar to MotionCtrl. (2) We use images containing camera rays and directions, similar to CameraCtrl. (3) We reproject the initial image to subsequent frames and use the resulting video as a condition. (4) We use 2D<=>3D transformers to introduce a global 3D representation, which implicitly conditions on the camera poses. We combine all conditions in a ContolNet-style architecture. We then propose a metric that evaluates overall video quality and the ability to preserve details with view changes, which we use to analyze the trade-offs of individual and combined…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization
MethodsDiffusion
