Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras
Huajian Huang, Longwei Li, Hui Cheng, and Sai-Kit Yeung

TL;DR
Photo-SLAM is a real-time SLAM system that combines explicit geometric features with learned implicit photometric features to achieve fast, photorealistic mapping on portable devices, outperforming existing methods.
Contribution
It introduces a hyper primitives map and a Gaussian-Pyramid-based training method, enabling efficient, high-quality mapping with explicit and implicit features.
Findings
30% higher PSNR compared to state-of-the-art
Hundreds of times faster rendering speed
Runs in real-time on embedded platforms like Jetson AGX Orin
Abstract
The integration of neural rendering and the SLAM system recently showed promising results in joint localization and photorealistic view reconstruction. However, existing methods, fully relying on implicit representations, are so resource-hungry that they cannot run on portable devices, which deviates from the original intention of SLAM. In this paper, we present Photo-SLAM, a novel SLAM framework with a hyper primitives map. Specifically, we simultaneously exploit explicit geometric features for localization and learn implicit photometric features to represent the texture information of the observed environment. In addition to actively densifying hyper primitives based on geometric features, we further introduce a Gaussian-Pyramid-based training method to progressively learn multi-level features, enhancing photorealistic mapping performance. The extensive experiments with monocular,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Neural Network Applications · Advanced Vision and Imaging
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
