ReLi3D: Relightable Multi-view 3D Reconstruction with Disentangled Illumination
Jan-Niklas Dihlmann, Mark Boss, Simon Donne, Andreas Engelhardt, Hendrik P.A. Lensch, Varun Jampani

TL;DR
ReLi3D introduces a fast, unified end-to-end pipeline that reconstructs detailed 3D geometry, materials, and illumination from sparse multi-view images, enabling near-instantaneous relightable 3D asset creation.
Contribution
It is the first to unify geometry, material, and illumination reconstruction into a single pipeline using multi-view constraints and transformer architecture.
Findings
Achieves under one second reconstruction time.
Demonstrates high accuracy in geometry, materials, and illumination.
Generalizes well across synthetic and real-world data.
Abstract
Reconstructing 3D assets from images has long required separate pipelines for geometry reconstruction, material estimation, and illumination recovery, each with distinct limitations and computational overhead. We present ReLi3D, the first unified end-to-end pipeline that simultaneously reconstructs complete 3D geometry, spatially-varying physically-based materials, and environment illumination from sparse multi-view images in under one second. Our key insight is that multi-view constraints can dramatically improve material and illumination disentanglement, a problem that remains fundamentally ill-posed for single-image methods. Key to our approach is the fusion of the multi-view input via a transformer cross-conditioning architecture, followed by a novel unified two-path prediction strategy. The first path predicts the object's structure and appearance, while the second path predicts…
Peer Reviews
Decision·ICLR 2026 Poster
1. The two-path feed-forward framework jointly reconstructs geometry, materials, and illumination under multi-view constraints, showing a clear and coherent design. 2. Uses Monte Carlo integration with MIS for training supervision, leading to more consistent and realistic reconstructions. 3. The model runs efficiently and shows some degree of cross-domain generalization with mixed synthetic–real training.
1. Limited evaluation diversity. The test data mostly covers diffuse or moderately lit objects. The paper lacks challenging cases such as metallic, transparent materials, or strong HDR illumination, where disentanglement performance would be most critical. 2. Lack of illumination disentanglement evaluation. The paper does not provide quantitative evaluation of the predicted lighting quality (e.g., comparison against SPAR3D or DiffusionLight) or at least sufficient qualitative examples demonstra
1.The writing is clear and easy to follow. 2.The proposed pipeline is new and makes a meaningful contribution to the field.
please see weakness for detail
To my knowledge, the suggested approach is the first which jointly reconstructs mesh, PBR, **and** environment (HDR), all in a feedforward manner and at impressive speeds. To me, this constitutes a significant contribution. Additionally, the paper contains novel ideas (more on that below) and is clear and well-written. 1. The idea of fusing arbitrary number of views with one "hero" view and other views with latent mixing (what the authors call "cross-view feature fusion") is novel and insightfu
1. The most important ablation is missing on whether to use multiple paths (one for geometry & appearance, another for illumination path) or compute them all in a single path. 2. While the method was trained and inferred either on real-world or synthetic data, it would be valuable to see how it generalizes to generated (e.g. with diffusion / flow models) images. This might improve practicality of this approach. 3. While 3D+Image metrics (Table 2) look convincing at first, qualitative results in
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Advanced Vision and Imaging · 3D Shape Modeling and Analysis
