Light-X: Generative 4D Video Rendering with Camera and Illumination Control

Tianqi Liu; Zhaoxi Chen; Zihao Huang; Shaocong Xu; Saining Zhang; Chongjie Ye; Bohan Li; Zhiguo Cao; Wei Li; Hao Zhao; Ziwei Liu

arXiv:2512.05115·cs.CV·December 16, 2025

Light-X: Generative 4D Video Rendering with Camera and Illumination Control

Tianqi Liu, Zhaoxi Chen, Zihao Huang, Shaocong Xu, Saining Zhang, Chongjie Ye, Bohan Li, Zhiguo Cao, Wei Li, Hao Zhao, Ziwei Liu

PDF

Open Access 2 Models 1 Datasets 3 Reviews

TL;DR

Light-X is a novel framework for controllable 4D video rendering that jointly manages camera and illumination, enabling high-quality, dynamic scene synthesis from monocular videos with disentangled geometry and lighting cues.

Contribution

The paper introduces Light-X, a new generative model that combines explicit geometry and lighting disentanglement with a novel training pipeline, advancing controllable 4D video synthesis.

Findings

01

Outperforms baseline methods in joint camera and illumination control

02

Achieves superior relighting quality under various conditions

03

Successfully synthesizes diverse dynamic scenes from monocular footage

Abstract

Recent advances in illumination control extend image-based methods to video, yet still facing a trade-off between lighting fidelity and temporal consistency. Moving beyond relighting, a key step toward generative modeling of real-world scenes is the joint control of camera trajectory and illumination, since visual dynamics are inherently shaped by both geometry and lighting. To this end, we present Light-X, a video generation framework that enables controllable rendering from monocular videos with both viewpoint and illumination control. 1) We propose a disentangled design that decouples geometry and lighting signals: geometry and motion are captured via dynamic point clouds projected along user-defined camera trajectories, while illumination cues are provided by a relit frame consistently projected into the same geometry. These explicit, fine-grained cues enable effective…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

* Light-X aims to achieve the very challenging task of disentangling geometry and illumination for dynamic scenes, yet it manages to achieve quite impressive visual results, as shown in the provided video. * This work also introduces an easy-to-setup data curation pipeline for creating paired original and relighted videos of geometrically coherent dynamic scenes. * The manuscript is well-structured and easy to follow.

Weaknesses

* **Limited technical contribution within a complex framework.** While the controlled and reasonable generation results are impressive, the overall system appears to be a loose combination of prior works like TrajectoryCrafter and IC-Light. The authors should better detail why jointly controlling both camera and illumination is an important task and which specific module Light-X introduces to better fit this combined task. * **Need for direct geometry comparison and evaluation.** The paper clai

Reviewer 02Rating 8Confidence 3

Strengths

- First paper to tackle the novel problem of video generation with joint camera and illumination control for monocular videos by providing conditioning to a diffusion transformer. - The Light-Syn pipeline uses an effective degradation idea for training data creation. The data sources comprise static scenes, dynamic scenes, and AI-generated videos with ablations justifying the significance of each source. - Light-DiT layer allows global illumination control by using a Q-Former to prevent diminish

Weaknesses

No major weakness. Minor Weakness: The evaluation metric using FID between the output image and IC-Light would have a potential evaluation bias, as the model is judged on its ability to mimic the behavior of a component (IC-Light) used in its own conditioning scheme.

Reviewer 03Rating 6Confidence 3

Strengths

- The proposed factorization method supplies the model with fine-grained, geometry-aligned cues (projected source views/masks and projected relit views/masks) and complementary global illumination tokens (Q-Former), which are technically sound and easy to reason about. - For the new joint control task (no direct prior), the paper composes reasonable baselines from camera-control and relighting methods, and also introduces a tailored training-free baseline with documented adaptations. - The evalu

Weaknesses

- For joint control, FID is computed against IC-Light-relit outputs on a TrajectoryCrafter sequence. That anchors “ground truth look” to IC-Light’s aesthetics and may bias the metric toward that method’s style. - The related work mentions several recent camera/lighting control or camera-controlled generators (e.g., VidCraft3, ReCamMaster, CAMI2V, Free4D/VD3D) that appear not to be included in quantitative comparisons. A brief justification or attempts to compare to the most recent DiT-based came

Code & Models

Models

Datasets

tqliu/Light-Syn
dataset· 96 dl
96 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Computer Graphics and Visualization Techniques