DINO-SLAM: DINO-informed RGB-D SLAM for Neural Implicit and Explicit Representations
Ziren Gong, Xiaohan Li, Fabio Tosi, Youmin Zhang, Stefano Mattoccia, Jun Wu, Matteo Poggi

TL;DR
DINO-SLAM introduces a novel SLAM system that leverages DINO-informed features and a Scene Structure Encoder to improve neural implicit and explicit scene representations, achieving superior results on multiple datasets.
Contribution
The paper proposes EDINO features and two SLAM paradigms integrating them, enhancing neural scene representations with hierarchical structural information.
Findings
Outperforms state-of-the-art methods on Replica, ScanNet, and TUM datasets.
Enriches scene features with hierarchical structural relationships.
Improves neural implicit and explicit SLAM representations.
Abstract
This paper presents DINO-SLAM, a DINO-informed design strategy to enhance neural implicit (Neural Radiance Field -- NeRF) and explicit representations (3D Gaussian Splatting -- 3DGS) in SLAM systems through more comprehensive scene representations. Purposely, we rely on a Scene Structure Encoder (SSE) that enriches DINO features into Enhanced DINO ones (EDINO) to capture hierarchical scene elements and their structural relationships. Building upon it, we propose two foundational paradigms for NeRF and 3DGS SLAM systems integrating EDINO features. Our DINO-informed pipelines achieve superior performance on the Replica, ScanNet, and TUM compared to state-of-the-art methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · 3D Shape Modeling and Analysis · Robot Manipulation and Learning
