SplatFormer: Point Transformer for Robust 3D Gaussian Splatting
Yutong Chen, Marko Mihajlovic, Xiyi Chen, Yiming Wang, Sergey Prokudin, and Siyu Tang

TL;DR
SplatFormer introduces a novel point transformer model that refines 3D Gaussian Splatting outputs, significantly enhancing rendering quality in out-of-distribution views for photorealistic 3D reconstruction.
Contribution
This work presents the first point transformer designed for 3D Gaussian Splatting, enabling effective refinement of initial reconstructions in a single pass for unseen views.
Findings
Achieves state-of-the-art performance in out-of-distribution view synthesis
Outperforms existing regularization and multi-scene methods
Effectively removes artifacts in challenging novel views
Abstract
3D Gaussian Splatting (3DGS) has recently transformed photorealistic reconstruction, achieving high visual fidelity and real-time performance. However, rendering quality significantly deteriorates when test views deviate from the camera angles used during training, posing a major challenge for applications in immersive free-viewpoint rendering and navigation. In this work, we conduct a comprehensive evaluation of 3DGS and related novel view synthesis methods under out-of-distribution (OOD) test camera scenarios. By creating diverse test cases with synthetic and real-world datasets, we demonstrate that most existing methods, including those incorporating various regularization techniques and data-driven priors, struggle to generalize effectively to OOD views. To address this limitation, we introduce SplatFormer, the first point transformer model specifically designed to operate on…
Peer Reviews
Decision·ICLR 2025 Spotlight
- The paper introduces a novel and important direction for rendering at unseen, highly relevant test views, addressing a significant gap in current 3D rendering research. - By employing point transformers for aggregating Gaussian splats, the method offers a sound and efficient approach to achieve improved detail and visual fidelity.
- The paper does not explore the potential of utilizing generative priors for OOD-NVS, particularly by introducing diffusion models to assist in hallucinating unseen views, which could enhance performance in novel view synthesis in a more reasonable way. - The study is primarily focused on object-centric cases, despite the availability of scene-level 3D datasets (scannet, scannet++, blendedmvs, megascene, megadepth, mvsimgnet). Expanding the scope to scene-wise data could provide a broader basis
1. The core contribution of this work to refine 3D Gaussians with a genelizable transformer is meaningful; 2. The authors construct training and evaluation sets for the claimed OOD problem, from ShapeNet and Objaverse dataset. 3. Extensive experiments with different baselines on multiple datasets confirm that the proposed method can obviously improve the rendering performances under poses with large elevations.
My major concerns lie on that some comparisons between the proposed method and baselines may be not so fair. For example, the optimization of the proposed method use 32 low-elevation views, while the results of some methods, e.g., LaRa, take only 4 views for input. The lack of training views may naturally affect its performances. Can we apply the proposed framework to the 3D Gaussian primitives generated by LaRa directly? In this way, the performances of proposed refinement might be evaluated m
- The presented new problem OOD-NVS is of great value. - Experiments are extensive, which can well validate the performance of the proposed method. - SplatFormer achieves SOTA performance on various object-centric datasets in OOD-NVS task compared to current related methods.
- Although some experiments using real-world datasets are conducted, all involved datasets are still mainly object-centric. It is still a problem that if this learning-based method can be applied to real-world and non-object-centric scenes with more complex foreground and background. The corresponding data are much more difficult to collect than the object-centric data, and also more difficult to process and use in training. - Lack of reporting geometry results. Although there are many compariso
Code & Models
Videos
Taxonomy
TopicsAdvanced Optical Sensing Technologies · Optical measurement and interference techniques · Advanced Measurement and Detection Methods
MethodsSparse Evolutionary Training
