LEAP: Liberate Sparse-view 3D Modeling from Camera Poses
Hanwen Jiang, Zhenyu Jiang, Yue Zhao, Qixing Huang

TL;DR
LEAP introduces a pose-free 3D modeling method that learns geometric priors directly from data, outperforming pose-dependent methods especially with noisy or predicted camera poses, and runs significantly faster.
Contribution
LEAP is the first approach to perform sparse-view 3D modeling without relying on camera poses, using a neural volume that encodes geometry and texture priors learned from data.
Findings
LEAP outperforms pose-dependent methods with predicted poses.
LEAP matches the performance of pose-based methods using ground-truth poses.
LEAP runs 400 times faster than PixelNeRF.
Abstract
Are camera poses necessary for multi-view 3D modeling? Existing approaches predominantly assume access to accurate camera poses. While this assumption might hold for dense views, accurately estimating camera poses for sparse views is often elusive. Our analysis reveals that noisy estimated poses lead to degraded performance for existing sparse-view 3D modeling methods. To address this issue, we present LEAP, a novel pose-free approach, therefore challenging the prevailing notion that camera poses are indispensable. LEAP discards pose-based operations and learns geometric knowledge from data. LEAP is equipped with a neural volume, which is shared across scenes and is parameterized to encode geometry and texture priors. For each incoming scene, we update the neural volume by aggregating 2D image features in a feature-similarity-driven manner. The updated neural volume is decoded into the…
Peer Reviews
Decision·ICLR 2024 poster
++ As a pose-free approach, LEAP discards pose-based operations and learns geometric knowledge from data. ++ LEAP is equipped with a neural volume, which is shared across scenes and is parameterized to encode geometry and texture priors. For each incoming scene, it updated the neural volume by aggregating 2D image features in a feature-similarity-driven manner. The updated neural volume is decoded into the radiance field, enabling novel view synthesis from any viewpoint. ++ The experimental e
-- Novel view synthesis is defined as rendering the images as specific camera pose and time (for dynamic scenes). When the camera poses are not avaliable or not estimated as this paper, how to deal with NVS with given camera poses, i.e., how to align the given camera poses with the training set images. -- Essentially, the method incoorporated the feature correspondences into the overall optimization, it is thus interesting to make comparisons with estimated correspondences from optical flow whe
1. The idea is novel. Unlike previous methods that try to predict or estimate camera poses in sparse views, the proposed method completely did not use camera pose to build the 3D volume representation. 2. The paper is well-written. It is easy to read and understand the motivation, background, problem, and high-level ideas to address the challenge. 3. The proposed multi-view encoder and the 2d-3d information mapping layers are novel, and efficacy has been demonstrated in the ablation study. 4.
1. Unlike the traditional pose-based projection, the proposed 2D-3D mapping layers are a weighted fusion of 2D features. The mapping may be more robust than pose-based projection when the pose is inaccurate but it limits in accuracy. Consequently, the reconstruction and rendering are often blurred, as shown in Figure 6. Do authors have ideas to improve it? 2. Sparse-view reconstruction is an ill-posed problem because the input images contain incomplete scene information. Although the proposed m
Originality: The reviewer did not identify a comparable concept within the existing literature, suggesting that the idea presented in this paper is novel and distinctive. Furthermore, the concept exhibits significant potential for broader applications across various use cases, further underscoring its relevance and practical utility. Clarity: The problem statement, literature review, and a portion of the methods and experimental details are clear to me. Quality and significance: The paper exhi
1. Clarity in the exposition of the method's approach to generating predicted images during the optimization process would be beneficial. 2. It would be advantageous if the paper delved deeper into the reasons behind its enhanced speed and the trade-offs involved in achieving such acceleration. 3. The proposed methodology presents certain limitations, particularly concerning relative poses. A more detailed exploration of how the network achieves accurate scale predictions without pose informatio
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Human Pose and Action Recognition
