Beyond Inpainting: Unleash 3D Understanding for Precise Camera-Controlled Video Generation

Dong-Yu Chen; Yixin Guo; Shuojin Yang; Tai-Jiang Mu; Shi-Min Hu

arXiv:2601.10214·cs.CV·February 2, 2026

Beyond Inpainting: Unleash 3D Understanding for Precise Camera-Controlled Video Generation

Dong-Yu Chen, Yixin Guo, Shuojin Yang, Tai-Jiang Mu, Shi-Min Hu

PDF

Open Access

TL;DR

DepthDirector is a novel framework that enhances precise camera control in video generation by leveraging depth-based guidance and 3D priors, improving content consistency and visual quality.

Contribution

We introduce DepthDirector, a depth-guided video re-rendering method with a dual-stream mechanism and a lightweight adapter, along with a large multi-camera dataset for improved 3D understanding in video synthesis.

Findings

01

Outperforms existing methods in camera controllability

02

Achieves higher visual quality and content consistency

03

Demonstrates effective use of depth guidance and 3D priors

Abstract

Camera control has been extensively studied in conditioned video generation; however, performing precisely altering the camera trajectories while faithfully preserving the video content remains a challenging task. The mainstream approach to achieving precise camera control is warping a 3D representation according to the target trajectory. However, such methods fail to fully leverage the 3D priors of video diffusion models (VDMs) and often fall into the Inpainting Trap, resulting in subject inconsistency and degraded generation quality. To address this problem, we propose DepthDirector, a video re-rendering framework with precise camera controllability. By leveraging the depth video from explicit 3D representation as camera-control guidance, our method can faithfully reproduce the dynamic scene of an input video under novel camera trajectories. Specifically, we design a View-Content…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Visual Attention and Saliency Detection