SplatFormer: Point Transformer for Robust 3D Gaussian Splatting

Yutong Chen; Marko Mihajlovic; Xiyi Chen; Yiming Wang; Sergey Prokudin; and Siyu Tang

arXiv:2411.06390·cs.CV·March 11, 2025

SplatFormer: Point Transformer for Robust 3D Gaussian Splatting

Yutong Chen, Marko Mihajlovic, Xiyi Chen, Yiming Wang, Sergey Prokudin, and Siyu Tang

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

SplatFormer introduces a novel point transformer model that refines 3D Gaussian Splatting outputs, significantly enhancing rendering quality in out-of-distribution views for photorealistic 3D reconstruction.

Contribution

This work presents the first point transformer designed for 3D Gaussian Splatting, enabling effective refinement of initial reconstructions in a single pass for unseen views.

Findings

01

Achieves state-of-the-art performance in out-of-distribution view synthesis

02

Outperforms existing regularization and multi-scene methods

03

Effectively removes artifacts in challenging novel views

Abstract

3D Gaussian Splatting (3DGS) has recently transformed photorealistic reconstruction, achieving high visual fidelity and real-time performance. However, rendering quality significantly deteriorates when test views deviate from the camera angles used during training, posing a major challenge for applications in immersive free-viewpoint rendering and navigation. In this work, we conduct a comprehensive evaluation of 3DGS and related novel view synthesis methods under out-of-distribution (OOD) test camera scenarios. By creating diverse test cases with synthetic and real-world datasets, we demonstrate that most existing methods, including those incorporating various regularization techniques and data-driven priors, struggle to generalize effectively to OOD views. To address this limitation, we introduce SplatFormer, the first point transformer model specifically designed to operate on…

Peer Reviews

Decision·ICLR 2025 Spotlight

Reviewer 01Rating 6Confidence 5

Strengths

- The paper introduces a novel and important direction for rendering at unseen, highly relevant test views, addressing a significant gap in current 3D rendering research. - By employing point transformers for aggregating Gaussian splats, the method offers a sound and efficient approach to achieve improved detail and visual fidelity.

Weaknesses

- The paper does not explore the potential of utilizing generative priors for OOD-NVS, particularly by introducing diffusion models to assist in hallucinating unseen views, which could enhance performance in novel view synthesis in a more reasonable way. - The study is primarily focused on object-centric cases, despite the availability of scene-level 3D datasets (scannet, scannet++, blendedmvs, megascene, megadepth, mvsimgnet). Expanding the scope to scene-wise data could provide a broader basis

Reviewer 02Rating 8Confidence 3

Strengths

1. The core contribution of this work to refine 3D Gaussians with a genelizable transformer is meaningful; 2. The authors construct training and evaluation sets for the claimed OOD problem, from ShapeNet and Objaverse dataset. 3. Extensive experiments with different baselines on multiple datasets confirm that the proposed method can obviously improve the rendering performances under poses with large elevations.

Weaknesses

My major concerns lie on that some comparisons between the proposed method and baselines may be not so fair. For example, the optimization of the proposed method use 32 low-elevation views, while the results of some methods, e.g., LaRa, take only 4 views for input. The lack of training views may naturally affect its performances. Can we apply the proposed framework to the 3D Gaussian primitives generated by LaRa directly? In this way, the performances of proposed refinement might be evaluated m

Reviewer 03Rating 8Confidence 4

Strengths

- The presented new problem OOD-NVS is of great value. - Experiments are extensive, which can well validate the performance of the proposed method. - SplatFormer achieves SOTA performance on various object-centric datasets in OOD-NVS task compared to current related methods.

Weaknesses

- Although some experiments using real-world datasets are conducted, all involved datasets are still mainly object-centric. It is still a problem that if this learning-based method can be applied to real-world and non-object-centric scenes with more complex foreground and background. The corresponding data are much more difficult to collect than the object-centric data, and also more difficult to process and use in training. - Lack of reporting geometry results. Although there are many compariso

Code & Models

Repositories

ChenYutongTHU/SplatFormer
pytorchOfficial

Videos

SplatFormer: Point Transformer for Robust 3D Gaussian Splatting· slideslive

Taxonomy

TopicsAdvanced Optical Sensing Technologies · Optical measurement and interference techniques · Advanced Measurement and Detection Methods

MethodsSparse Evolutionary Training