SCE-SLAM: Scale-Consistent Monocular SLAM via Scene Coordinate Embeddings

Yuchen Wu; Jiahe Li; Xiaohan Yu; Lina Yu; Jin Zheng; Xiao Bai

arXiv:2601.09665·cs.CV·January 15, 2026

SCE-SLAM: Scale-Consistent Monocular SLAM via Scene Coordinate Embeddings

Yuchen Wu, Jiahe Li, Xiaohan Yu, Lina Yu, Jin Zheng, Xiao Bai

PDF

Open Access

TL;DR

SCE-SLAM introduces a novel monocular SLAM system that maintains scale consistency across large scenes by learning scene coordinate embeddings, significantly reducing drift and improving accuracy in real-time applications.

Contribution

The paper presents a new end-to-end SLAM framework that uses scene coordinate embeddings and geometry-guided aggregation to enforce scale consistency, addressing a key limitation of existing methods.

Findings

01

Reduces absolute trajectory error by 8.36m on KITTI dataset.

02

Maintains 36 FPS in large-scale scenes.

03

Achieves scale consistency across diverse environments.

Abstract

Monocular visual SLAM enables 3D reconstruction from internet video and autonomous navigation on resource-constrained platforms, yet suffers from scale drift, i.e., the gradual divergence of estimated scale over long sequences. Existing frame-to-frame methods achieve real-time performance through local optimization but accumulate scale drift due to the lack of global constraints among independent windows. To address this, we propose SCE-SLAM, an end-to-end SLAM system that maintains scale consistency through scene coordinate embeddings, which are learned patch-level representations encoding 3D geometric relationships under a canonical scale reference. The framework consists of two key modules: geometry-guided aggregation that leverages 3D spatial proximity to propagate scale information from historical observations through geometry-modulated attention, and scene coordinate bundle…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques