Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline
Linqing Zhao, Xiuwei Xu, Yirui Wang, Hao Wang, Wenzhao Zheng, Yansong Tang, Haibin Yan, Jiwen Lu

TL;DR
This paper introduces a fast, feed-forward RGB SLAM method that combines 3D Gaussian mapping with a recurrent pose prediction module, significantly improving speed while maintaining state-of-the-art accuracy in 3D reconstruction.
Contribution
It proposes a novel online 3D Gaussian-based SLAM system with a recurrent pose predictor, replacing slow optimization with fast inference for real-time RGB-D SLAM.
Findings
Achieves state-of-the-art accuracy on Replica and TUM-RGBD datasets.
Reduces tracking time by over 90%.
Demonstrates real-world deployment feasibility.
Abstract
Incrementally recovering real-sized 3D geometry from a pose-free RGB stream is a challenging task in 3D reconstruction, requiring minimal assumptions on input data. Existing methods can be broadly categorized into end-to-end and visual SLAM-based approaches, both of which either struggle with long sequences or depend on slow test-time optimization and depth sensors. To address this, we first integrate a depth estimator into an RGB-D SLAM system, but this approach is hindered by inaccurate geometric details in predicted depth. Through further investigation, we find that 3D Gaussian mapping can effectively solve this problem. Building on this, we propose an online 3D reconstruction method using 3D Gaussian-based SLAM, combined with a feed-forward recurrent prediction module to directly infer camera pose from optical flow. This approach replaces slow test-time optimization with fast…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
