FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction
Jiale Xu, Shenghua Gao, Ying Shan

TL;DR
FreeSplatter is a novel pose-free 3D reconstruction method that generates high-quality 3D models from uncalibrated sparse-view images and estimates camera poses efficiently, simplifying 3D content creation.
Contribution
It introduces a scalable transformer-based framework that jointly reconstructs 3D Gaussians and estimates camera poses from uncalibrated images, outperforming pose-dependent models.
Findings
Outperforms several pose-dependent Large Reconstruction Models.
Achieves comparable or better camera pose estimation accuracy than state-of-the-art pose-free methods.
Enables high-fidelity 3D modeling without requiring camera calibration.
Abstract
Sparse-view reconstruction models typically require precise camera poses, yet obtaining these parameters from sparse-view images remains challenging. We introduce FreeSplatter, a scalable feed-forward framework that generates high-quality 3D Gaussians from uncalibrated sparse-view images while estimating camera parameters within seconds. Our approach employs a streamlined transformer architecture where self-attention blocks facilitate information exchange among multi-view image tokens, decoding them into pixel-aligned 3D Gaussian primitives within a unified reference frame. This representation enables both high-fidelity 3D modeling and efficient camera parameter estimation using off-the-shelf solvers. We develop two specialized variants--for object-centric and scene-level reconstruction--trained on comprehensive datasets. Remarkably, FreeSplatter outperforms several pose-dependent Large…
Peer Reviews
Decision·Submitted to ICLR 2025
- This paper proposes feed-forward pipeline for pose-free 3D reconstruction with sparse input images. - It shows the state-of-the-art performance in view synthesis, and comparable performance in pose estimation. - The results are well-presented.
- Method section lacks detail and detailed explanations, regarding how the suggested method is aligned with the motivation of the paper, and how it is intended to improve performance. - Ablation studies are weak. Influence of the number of input views is subsidiary, considering the motivation and methodology of this paper. Only quantitative result via plug-in-plug-out styled experiment on pixel-alignment loss is naive. - Additional application examples are not aligned with the paper's methodolog
1. It's interesting to learn that "camera poses may not be essential for training high-quality and scalable large reconstruction models." 2. The proposed method is technically sound and elegantly designed. 3. The authors demonstrate that the proposed method outperforms baseline methods.
1. The idea of outputting point maps relative to the main (or first) view is not novel in either pose estimation [1] or unposed sparse-view reconstruction [2]. 2. Extending LRM to an unposed sparse-view setting is also not new. PF-LRM seems highly similar to the proposed method, although there are some differences: (a) the proposed method predicts point maps in the coordinate frame of a reference view rather than in the object/world frame; (b) it uses Gaussian Splatting instead of NeRF; (c) PF-
1. This work wisely distills the recent advances in 3D reconstruction such as DUSt3R, 3DGS, and LRM, into one unified framework. Unlike prior works directly integrating multiple pre-trained foundation models, this framework/model design is neat and effective. 2. The method considers both object-level and scene-level reconstructions, albeit in two separate pre-trained models, which again shows the effectiveness of the proposed pose-free reconstruction strategy. This is another advantage compared
1. Since the proposed method focuses on sparse-view reconstruction. It would be better if authors could also include the comparison with other optimization-based sparse-view reconstruction methods, for example, ReconFusion, GaussianObject, InstantSplat, etc. 2. L305-311 discusses the occlusion issue with Gaussian maps. Even though the authors propose a strategy to alleviate this problem, I assume this method could still suffer from missing Gaussian points in occluded areas.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · 3D Surveying and Cultural Heritage · Computer Graphics and Visualization Techniques
