TL;DR
Relit-LiVE introduces a novel video relighting framework that achieves physically consistent, temporally stable results without prior camera pose knowledge by jointly predicting relit videos and environment maps.
Contribution
The method explicitly incorporates raw reference images and proposes a joint environment video prediction formulation to improve physical consistency and handle dynamic lighting and camera motion.
Findings
Outperforms state-of-the-art methods on synthetic and real-world benchmarks.
Supports dynamic lighting and camera motion with improved physical consistency.
Enables downstream applications like scene rendering and object insertion.
Abstract
Recent advances have shown that large-scale video diffusion models can be repurposed as neural renderers by first decomposing videos into intrinsic scene representations and then performing forward rendering under novel illumination. While promising, this paradigm fundamentally relies on accurate intrinsic decomposition, which remains highly unreliable for real-world videos and often leads to distorted appearances, broken materials, and accumulated temporal artifacts during relighting. In this work, we present Relit-LiVE, a novel video relighting framework that produces physically consistent, temporally stable results without requiring prior knowledge of camera pose. Our key insight is to explicitly introduce raw reference images into the rendering process, enabling the model to recover critical scene cues that are inevitably lost or corrupted in intrinsic representations. Furthermore,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
