MDE-VIO: Enhancing Visual-Inertial Odometry Using Learned Depth Priors

Arda Alniak; Sinan Kalkan; Mustafa Mert Ankarali; Afsar Saranli; Abdullah Aydin Alatan

arXiv:2602.11323·cs.CV·February 13, 2026

MDE-VIO: Enhancing Visual-Inertial Odometry Using Learned Depth Priors

Arda Alniak, Sinan Kalkan, Mustafa Mert Ankarali, Afsar Saranli, Abdullah Aydin Alatan

PDF

Open Access

TL;DR

This paper introduces a real-time capable method that integrates learned depth priors into monocular VIO systems, improving accuracy and robustness in low-texture environments while maintaining computational efficiency for edge devices.

Contribution

It presents a novel framework that enforces depth consistency and filters artifacts, enabling dense depth integration into VIO within edge device constraints.

Findings

01

Reduces Absolute Trajectory Error by up to 28.3%

02

Prevents divergence in challenging scenarios

03

Maintains real-time performance on edge devices

Abstract

Traditional monocular Visual-Inertial Odometry (VIO) systems struggle in low-texture environments where sparse visual features are insufficient for accurate pose estimation. To address this, dense Monocular Depth Estimation (MDE) has been widely explored as a complementary information source. While recent Vision Transformer (ViT) based complex foundational models offer dense, geometrically consistent depth, their computational demands typically preclude them from real-time edge deployment. Our work bridges this gap by integrating learned depth priors directly into the VINS-Mono optimization backend. We propose a novel framework that enforces affine-invariant depth consistency and pairwise ordinal constraints, explicitly filtering unstable artifacts via variance-based gating. This approach strictly adheres to the computational limits of edge devices while robustly recovering metric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Robot Manipulation and Learning