Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians
Erik Sandstr\"om, Keisuke Tateno, Michael Oechsle, Michael Niemeyer,, Luc Van Gool, Martin R. Oswald, Federico Tombari

TL;DR
Splat-SLAM introduces a novel RGB-only SLAM system utilizing a globally optimized 3D Gaussian map, achieving high-quality dense mapping and rendering with efficient runtime, surpassing previous methods in accuracy and compactness.
Contribution
It is the first RGB-only SLAM system with a dense 3D Gaussian map that employs global optimization and dynamic map deformation for improved accuracy.
Findings
Achieves superior or comparable tracking and mapping accuracy.
Produces small, efficient map representations.
Operates with fast runtimes.
Abstract
3D Gaussian Splatting has emerged as a powerful representation of geometry and appearance for RGB-only dense Simultaneous Localization and Mapping (SLAM), as it provides a compact dense map representation while enabling efficient and high-quality map rendering. However, existing methods show significantly worse reconstruction quality than competing methods using other 3D representations, e.g. neural points clouds, since they either do not employ global map and pose optimization or make use of monocular depth. In response, we propose the first RGB-only SLAM system with a dense 3D Gaussian map representation that utilizes all benefits of globally optimized tracking by adapting dynamically to keyframe pose and depth updates by actively deforming the 3D Gaussian map. Moreover, we find that refining the depth updates in inaccurate areas with a monocular depth estimator further improves the…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. The implementation of the system is sophisticated which requires extensive effort. 2. The utilization of monocular depth offers good regularization of depth map while the depth are still calculated per-pixel which maintains the accuracy. 3. The performance, especially the global consistency is superior to other global consistent 3DGS baselines.
1. The paper is more about a sophisticated system implementation for a specific problem and lacks some insights or contributions to the understanding of the problem. Thus, it is better to be published at other venues such as CVPR and 3DV. 2. The term "Deformable" is a little bit confusing since it is widely referred to deformable structure in computer vision literatures. Thus, it is better to replace this term with other word. 3. There are some missing but important details in Figure.2: - Ho
- The paper conducted experiments on both synthetic and real-world datasets for tracking and mapping evaluation - The paper writing is comprehensive and easy to understand - The paper presents high performance among the baseline methods
- Lack of novelty. The paper significantly derives its methods from Droid-SLAM, especially in monocular tracking based on raft, with the addition of a depth estimation module previously introduced in Glorie-SLAM, for which appropriate credit is lacking in the methods section. Moreover, although the proposed deformed map offers a partial solution to BA-induced inconsistencies, it heavily relies on the existing monocular tracking method, which has limited contribution to addressing the core challe
1) This paper is the first RGB-only 3DGS-based SLAM system with loop closure, proxy depth, and online 3D Gaussian map deformations with improved map sizes and runtimes. In my opinion, this is a relatively comprehensive work in the field of 3DGS-based SLAM. As loop closure is a crucial challenge in SLAM, this work enables map deformations at loop closure and integrates global bundle adjustment. 2) Extensive evaluations across multiple datasets demonstrate accurate tracking and high-quality rende
1) It would be better to include visualizations that illustrate loop closure results. For example, presenting complete reconstruction results of selected scenes and comparing these results across your method, ground truth, and other approaches would offer valuable insights. 2) Minor: Wrong numbers are highlighted in Table 1. The section on Influence of Monocular Depth in the supplementary material is interesting, as it illustrates the upper bound of your approach. Since this content enhance
The proposed SLAM system combines the strengths of frame-to-frame tracking using recurrent dense optical flow with the fidelity of 3D Gaussians as the map representation without the dependence of depth inputs. The proposed SLAM system performs better than existing RGB-only SLAM methods in tracking, mapping, and rendering accuracy, and more importantly, yielding small map sizes and fast runtimes. The paper is well-written, concise, and has excellent formatting of figures and formulas.
1. The author suggests that this paper proposed the first RGB-only SLAM system with a dense 3D Gaussian map that globally optimized tracking by adapting dynamically to keyframe pose and depth updates. Nevertheless, I argue that this claim is not entirely accurate, since certain prior studies, notably PhotoSLAM, support monocular video as well. 2. Notably, the proposed method falls short of achieving real-time performance, with a frame rate of only 1.24 FPS on the simplest Replica room0 dataset
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Robotic Path Planning Algorithms · Modular Robots and Swarm Intelligence
