VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation
Sixiao Zheng, Zimian Peng, Yanpeng Zhou, Yi Zhu, Hang Xu, Xiangru Huang, Yanwei Fu

TL;DR
VidCRAFT3 introduces a unified framework for image-to-video generation that allows precise control over camera, object, and lighting, overcoming previous limitations of separate control signals and dataset scarcity.
Contribution
It presents a novel integrated approach with three core components and a new dataset, enabling joint control in image-to-video generation.
Findings
Outperforms existing methods in control accuracy
Achieves higher visual coherence in generated videos
Demonstrates robustness with limited joint annotations
Abstract
Controllable image-to-video (I2V) generation transforms a reference image into a coherent video guided by user-specified control signals. In content creation workflows, precise and simultaneous control over camera motion, object motion, and lighting direction enhances both accuracy and flexibility. However, existing approaches typically treat these control signals separately, largely due to the scarcity of datasets with high-quality joint annotations and mismatched control spaces across modalities. We present VidCRAFT3, a unified and flexible I2V framework that supports both independent and joint control over camera motion, object motion, and lighting direction by integrating three core components. Image2Cloud reconstructs a 3D point cloud from the reference image to enable precise camera motion control. ObjMotionNet encodes sparse object trajectories into multi-scale optical flow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · CCD and CMOS Imaging Sensors · Computer Graphics and Visualization Techniques
MethodsAttention Is All You Need · ADaptive gradient method with the OPTimal convergence rate · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax · Dropout · Absolute Position Encodings · Label Smoothing
