IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation
Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Ronald Clark, Ming-Hsuan Yang

TL;DR
IllumiCraft introduces a unified diffusion framework that integrates lighting, appearance, and geometry cues for controllable, high-fidelity, and temporally coherent video generation from various inputs.
Contribution
It is the first to combine HDR lighting, appearance cues, and 3D geometry in a diffusion model for controllable video synthesis.
Findings
Supports background-conditioned and text-conditioned relighting.
Achieves better fidelity than existing methods.
Generates temporally coherent videos aligned with user prompts.
Abstract
Although diffusion-based models can generate high-quality and high-resolution video sequences from textual or image inputs, they lack explicit integration of geometric cues when controlling scene lighting and visual appearance across frames. To address this limitation, we propose IllumiCraft, an end-to-end diffusion framework accepting three complementary inputs: (1) high-dynamic-range (HDR) video maps for detailed lighting control; (2) synthetically relit frames with randomized illumination changes (optionally paired with a static background reference image) to provide appearance cues; and (3) 3D point tracks that capture precise 3D geometry information. By integrating the lighting, appearance, and geometry cues within a unified diffusion architecture, IllumiCraft generates temporally coherent videos aligned with user-defined prompts. It supports background-conditioned and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Advanced Vision and Imaging · Advanced Numerical Analysis Techniques
MethodsDiffusion
