Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual Odometry

Takayuki Kanai; Igor Vasiljevic; Vitor Guizilini; Kazuhiro Shintani

arXiv:2406.00929·cs.CV·September 30, 2025

Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual Odometry

Takayuki Kanai, Igor Vasiljevic, Vitor Guizilini, Kazuhiro Shintani

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-supervised, geometry-guided initialization method for monocular visual odometry that enhances robustness and accuracy, especially in challenging outdoor scenarios with large motions and dynamic objects.

Contribution

It proposes leveraging a frozen large-scale pre-trained monocular depth estimator to improve dense SLAM initialization without additional fine-tuning.

Findings

01

Significant improvements on KITTI odometry benchmark.

02

Enhanced robustness in large motion and dynamic object scenarios.

03

Effective initialization method for dense SLAM models.

Abstract

Monocular visual odometry is a key technology in various autonomous systems. Traditional feature-based methods suffer from failures due to poor lighting, insufficient texture, and large motions. In contrast, recent learning-based dense SLAM methods exploit iterative dense bundle adjustment to address such failure cases, and achieve robust and accurate localization in a wide variety of real environments, without depending on domain-specific supervision. However, despite its potential, the methods still struggle with scenarios involving large motion and object dynamics. In this study, we diagnose key weaknesses in a popular learning-based dense SLAM model (DROID-SLAM) by analyzing major failure cases on outdoor benchmarks and exposing various shortcomings of its optimization process. We then propose the use of self-supervised priors leveraging a frozen large-scale pre-trained monocular…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TRI-ML/vidar
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Image and Object Detection Techniques