3D Reconstruction with Generalizable Neural Fields using Scene Priors
Yang Fu, Shalini De Mello, Xueting Li, Amey Kulkarni, Jan Kautz,, Xiaolong Wang, Sifei Liu

TL;DR
This paper introduces a generalizable neural field approach that uses scene priors to efficiently reconstruct 3D scenes from limited views, enabling fast adaptation and novel-view synthesis without scene-specific training.
Contribution
The proposed NFP method is the first to incorporate scene priors into neural fields for scalable, flexible 3D reconstruction from single or few views.
Findings
Achieves state-of-the-art reconstruction quality and efficiency.
Supports single-image novel-view synthesis.
Enables fast adaptation to new scenes with limited data.
Abstract
High-fidelity 3D scene reconstruction has been substantially advanced by recent progress in neural fields. However, most existing methods train a separate network from scratch for each individual scene. This is not scalable, inefficient, and unable to yield good results given limited views. While learning-based multi-view stereo methods alleviate this issue to some extent, their multi-view setting makes it less flexible to scale up and to broad applications. Instead, we introduce training generalizable Neural Fields incorporating scene Priors (NFPs). The NFP network maps any single-view RGB-D image into signed distance and radiance values. A complete scene can be reconstructed by merging individual frames in the volumetric space WITHOUT a fusion module, which provides better flexibility. The scene priors can be trained on large-scale datasets, allowing for fast adaptation to the…
Peer Reviews
Decision·ICLR 2024 poster
1) Generalized framework for scene reconstruction using radiance fields (i.e., no per=scene training) from relatively few input views (relative to existing literature) 2) Ability to reconstruct a 3D scene by merging individual frames in the volumetric space without a learnable fusion module 3) Novel view synthesis from single-view input beats existing works 4) Simple interpolation strategy for obtaining point features. Making use of surface points instead of dense volumetric grids for obtaini
1) The dependency on depth maps is limiting since such data is not always available. This is also a drawback of MonoSDF and other works that use additional priors for scene reconstruction, beyond just RGB images 2) The intuition behind the approximation of GT SDF values by observing depth values along a ray is unclear. This may result in erroneous signed distance predictions. An explanation of this is lacking. I am actually interested in this ablation experiment 3) Results on images in the wild
+ ## Readability. As it currently stands, the paper is very well written. The main ideas and concepts are mostly well explained and articulated throuthout. + ## Organization of the contents and overall paper structure. The contents are also very well structured and balanced. + ## Related work section and discussion. It is very well structued, articulated and populated with very relevant and up to date references. + ## The disclosed performance of the proposed method is at the very least comp
+ ## 1. Missing bits of context information - How much does it cost? While indicative timings and thorough implementation details (in supMat) are provided, information regarding the resource usage, model size and complexity are yet underdescribed. A comparative disclosure of such information covering the main experimental baselines that are considered would help the reader better assess its relative positioning throughout the typical criteria. Mentioning where the computation bottlenecks lie
(1) Overall, the paper is well written and easy to follow. (2) The paper demonstrates comprehensive experiments and compares with different state-of-the-art (SOTA) methods, to highlight the advantage of the proposed method.
(1) To me, the novelty of this paper is limited. The key design of the geometry prior module is similar to PointNeRF (also employs a distance-wise feature aggregation from 3D point clouds). Also, learning a geometric prior (SDF) then pruning to facilitate the texture field is applied in previous neural 3D reconstruction methods such as NeRFusion and SparseNeuS. (2) The paper only conducts experiments on RGB-D sequences to demonstrate the generalizability. To me, the technical impact would be m
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Optical measurement and interference techniques
